unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
51 stars 39 forks source link

add ctt14651.txt: ISO 14651 collation Common Template Table #928

Closed markusicu closed 1 month ago

markusicu commented 1 month ago

This is the ISO/IEC 14651 Common Template Table. This is basically a different file format for the same data as in UCA allkeys.txt.

Ken generated this file as ctt14651.txt using the sifter tool but renamed it to CTT_V16_0.txt for publication. In the unicodetools repo, I assume that we don't need to keep multiple versions in parallel. Therefore, I am using a version-less filename here.

I could create a new folder for this file, but that seems silly. It is a sibling to allkeys.txt but starts out unversioned, so I am putting it into .../data/uca/.

Before Unicode 17 alpha, we need to update the publication scripts to output this file and rename it with the current-version suffix. If we only generate it at the end of the release cycle, then maybe it needs to be published only in the "final" script.

markusicu commented 1 month ago

If this is going to be named CTT.txt in the repo, then we should update the sifter code at some point to output "CTT.txt" instead of "ctt14651.txt" as it currently does.

I wasn't looking at the sifter. I was just going by the filename that you provided, minus the version suffix.

If the sifter currently writes "ctt14651.txt" then I should rename this file right now before it goes in. Unless you like CTT.txt better and we go the other way.

Your preference?

Ken-Whistler commented 1 month ago

It's not much difference to me. But, yeah, the current output from the sifter is "ctt14651.txt", so that would be easier. (The output file names are defined in lines 195-200 of unisift.c.) The output file can then be named to whatever we need for an official CTT version for 14651 upon deployment.

markusicu commented 1 month ago

Renamed, PTAL

markusicu commented 1 month ago

Ken is extremely busy with the Unicode 16 release. Could someone else please rubber-stamp this simple PR?