unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
50 stars 39 forks source link

Add a GCB test case for ID20230630074506 #506

Open eggrobin opened 1 year ago

eggrobin commented 1 year ago

From unicode-org/properties#149, ID20230630074506: we should ensure that we have a grapheme segmentation test case with an ExtCccZwj just after the second LinkingConsonant in a three-consonant cluster:

LinkingConsonant ExtCccZwj* ConjunctLinker ExtCccZwj* LinkingConsonant ExtCccZwj ConjunctLinker ExtCccZwj* LinkingConsonant

markusicu commented 1 year ago

Wasn't it about a trailing ExtCccZwj?

eggrobin commented 1 year ago

No, Charlotte’s suggestion would have added a trailing ExtCccZwj, but that is covered by postcore anyway.

The real bug was « we get a sequence pattern where ExtCccZwj can occur only after a ConjunctLinker but not before it: […] This does not match rule GB9c which accounts for ExtCccZwj in both positions, which is necessary because Indic scripts make use of combining marks with CCC values both smaller and greater than 9 (Virama). »