Note that while I noticed it while working on the 15.1 update, this issue is about 15.0 segmentation as implemented in ICU4X with 15.0 property assignments, compared to that defined by the 15.0 standard (not that anything changed for sentence segmentation in 15.1).
Note that while I noticed it while working on the 15.1 update, this issue is about 15.0 segmentation as implemented in ICU4X with 15.0 property assignments, compared to that defined by the 15.0 standard (not that anything changed for sentence segmentation in 15.1).
I generated some tests using the ICU4C monkeys (at https://github.com/eggrobin/icu/commit/b1612851e4e715c37279a74bfcd97d4f2056fd0c, with seed 1729). The following test fails:
The error messages of the existing tests are not particularly helpful when dealing with large random sequences like that; I tried printing something a little bit more like the output of the ICU4C monkey tests, see below (output from https://github.com/eggrobin/icu4x/commit/e6ac4ee3b8b3327ca992aa20a73957827becb6b1).
Note that
SentenceBreak(14)
is SContinue; see #4037.At a glance, it looks like rules SB9 and SB11 are improperly applied.