Open markusicu opened 6 months ago
@macchiati do we need the cldrWithoutFFFx option?
Hmmm. As I recall, the FFFE and FFFF are to allow users to have minimum and maximum collation elements. As long as we continue to keep those in the CLDR data, I think we are ok.
Hmmm. As I recall, the FFFE and FFFF are to allow users to have minimum and maximum collation elements. As long as we continue to keep those in the CLDR data, I think we are ok.
Of course we are going to keep them in CLDR. --> https://www.unicode.org/reports/tr35/tr35-collation.html#tailored_noncharacter_weights
It (still) makes sense that we have two choices for collators, but why three? class UCA -->
public enum CollatorType {
ducet,
cldr,
cldrWithoutFFFx
}
I don't recall any reason to keep the without ..
On Wed, Aug 21, 2024, 08:51 Markus Scherer @.***> wrote:
Hmmm. As I recall, the FFFE and FFFF are to allow users to have minimum and maximum collation elements. As long as we continue to keep those in the CLDR data, I think we are ok.
Of course we are going to keep them in CLDR. --> https://www.unicode.org/reports/tr35/tr35-collation.html#tailored_noncharacter_weights
It (still) makes sense that we have two choices for collators, but why three? class UCA -->
public enum CollatorType { ducet, cldr, cldrWithoutFFFx }
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/794#issuecomment-2302423105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBJWGYNKRH7AWL2UUTZSSZPTAVCNFSM6AAAAABHCPK3DWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBSGQZDGMJQGU . You are receiving this because you were mentioned.Message ID: @.***>
Thanks. Setting priority=high because the question is resolved, and it looks like the code change will be easy.
WriteCollationData.getCollator(type) (issue #793 would move this function to class UCA) works with three types, one is cldrWithoutFFFx which builds a CLDR collator except that it leaves U+FFFE and U+FFFF with their DUCET mappings rather than their CLDR tailorings.
Strangely, FractionalUCA.java works with such a collator, even though it writes "SPECIAL MAX/MIN COLLATION ELEMENTS" for these noncharacters, corresponding to the CLDR tailorings.
This type is also used for UCA.Main option testCompatibilityCharacters.
Why? It seems confusing to have this third type, especially to get something different from what we actually output. Try to remove it and only use either a DUCET collator or a CLDR collator.
If we need and keep this option, then at least consider changing buildCldrCollator(boolean) to buildCldrCollator(enum type) for readability.
@macchiati FYI