Open robertbastian opened 5 months ago
@eggrobin What is left on this issue?
What is left on this issue?
All of it? It was created to allow us to close the specific issue reported in https://github.com/unicode-org/icu4x/issues/4417, but word segmentation is still wrong and hasn’t changed since this was filed.
WB3c and WB3c interact in the same way LB8a and LB9 do. A correct implementation of that would require either duplicating every state as in https://github.com/unicode-org/icu4x/pull/4389, or hoisting the two rules into the logic as in https://github.com/unicode-org/icu4x/pull/5001.
The latter seems more attractive, both for data size and sanity of the maintainer; note that since
rule_segmenter.rs
is shared with extended grapheme cluster and sentence breaking, this will require passing a flag for that logic.