unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
52 stars 41 forks source link

CLDR CollationTest: omit simplified radicals #914

Closed markusicu closed 3 months ago

markusicu commented 3 months ago

While updating ICU to the latest Unicode 16 files, I got collation conformance test failures. The CollationTest files are generated with the implicit-weights Han sort order and already omit Han characters that are known to sort differently in that sort order vs. radical-stroke order.

With the recent change to make the CLDR radical-stroke order match the one in UAX38 (unicodetools PR #909), we need to omit some more characters. Characters with traditional and simplified radicals are now intermingled, and some of them now sort differently in implicit-Han vs. radical-stroke order. I changed the CollationTest generator to omit all of the simplified radicals.

No DUCET CollationTest file changes.

Related:

markusicu commented 3 months ago

On this one, I will fix a TODO comment by turning into a note with more information.

markusicu commented 3 months ago

@macchiati thanks! I just pushed a second commit explaining why we now have more characters in the original Unihan block that don't sort in the improved radical-stroke order, resolving the TODO from my previous PR #909. @echeran FYI