unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
52 stars 41 forks source link

Remove SegmenterCldr.txt #947

Closed eggrobin closed 1 month ago

eggrobin commented 1 month ago

Back in March I had written:

Right now the 15.1 line breaking rules in SegmenterCldr.txt are used by no-one, and this is a good thing since they are wrong and untested (and we went through a release with these wrong rules!). Let’s get rid of that quasi-copy before someone gets hurt.

And then I forgot to do anything. They are still wrong, and they have not been updated for 16.0. Let’s get rid of those.

eggrobin commented 1 month ago

We discussed that earlier: https://github.com/unicode-org/unicodetools/issues/492#issuecomment-2024020481. Right now this is not used, and if it had been used it would have produced garbage, most likely without anyone noticing, so it is a footgun: we really should remove SegmenterCldr.txt before someone gets hurt. When it comes to the code itself, I am removing the minimum needed to make CI pass; if we ever want to bring back a SegmenterCldr this is trivial to revert (that is why we have source control!).