tc39 / proposal-intl-segmenter

Unicode text segmentation for ECMAScript
https://tc39.github.io/proposal-intl-segmenter/
146 stars 16 forks source link

EDITORIAL: split FindBoundary to FindBoundaryAfter and FindBoundaryOnOrBefore #103

Closed FrankYFTang closed 4 years ago

FrankYFTang commented 4 years ago

I suggest we break the abstract operation

https://tc39.es/proposal-intl-segmenter/#sec-findboundary

FindBoundary ( segmenter, string, startIndex, direction )

into two operations:

FindBoundaryAfter ( segmenter, string, startIndex ) 
FindBoundaryOnOrBefore ( segmenter, string, startIndex )

instead. It does not make sense to pass in the direction as before and after and then test it with an if inside that abstract operation. Splitting that into two operations make the spec simpler by removing unnecessary condition which is fixed in the caller.

Notice the current spec stated:

4. If direction is before, then
4.a If len is 0 or startIndex < 0, return -∞.
4.b Search string for the last segmentation boundary that is preceded by 
  *at most* startIndex code units from the beginning, using locale locale
  and text element granularity granularity.

"by at most startIndex code units from the beginning" means in the case of direction is before, the return value could be the same as startIndex, therefore, it find the boundary "on or before" the startIndex, which is not "before" the startIndex.

FrankYFTang commented 4 years ago

@gibson042

gibson042 commented 4 years ago

I'll clarify that boundaries occur between code unit indices, but I don't want to split this operation because duplication of the important aspects about boundary determination seems worse than having an internal branch.

FrankYFTang commented 4 years ago

"because duplication of the important aspects about boundary determination" but there are NO duplication of "boundary determination" in the steps. The only STEPS will be duplicate are

1. Let _locale_ be _segmenter_.[[Locale]].
2. Let _granularity_ be _segmenter_.[[SegmenterGranularity]].

which has nothing to do with "boundary determination" the line

3. Let len be the length of string.

is actually not needed to be mentioned in the case of "before" since steps

b. Search string for the last segmentation boundary that is preceded by at most startIndex code units from the beginning, using locale locale and text element granularity granularity.
c. If a boundary is found, return the count of code units in string preceding it. Otherwise, return 0.

do not need to use len at all and there are no point to find out the value of len for the case of "before".

If you do not want to duplicate the text before the steps, you can put that before the title or just let the second operation refer to that.

FrankYFTang commented 4 years ago

@tc39/ecma-fellows @zbraniecki @littledan