Open hsivonen opened 1 year ago
Can we normalize this in datagen?
Not sure. It's seems rather silly to put a lot of effort into this. In theory, we should handle CLDR 41 data already generated. In practice, CLDR 41 probably no longer matters at all for ICU4X.
I have a really hard time believing that there is a real constituency who a) wants to run ICU4X trunk code with CLDR 41 data and b) wants the legacy collation for Swedish.
That's why I think it should be enough to unconditionally remove -u-co-standard
from sv
even if the result is theoretically not exactly correct for CLDR 41.
The default collation should be stored at sv
, and the non-default collation foo
should be stored at sv-u-co-foo
. During data lookup, a request for sv
will get the default collation, and if a specific collation bar
is requested, first sv-u-co-bar
will be tried, and if it is not there, then lookup will fall back to sv
.
I guess then the current code gives the right outcome for Swedish with both CLDR 41 and CLDR 42, but at a tiny perf hit for 42 and later.
Note: I'd like it if we could be testing CLDR 41 and 42 while working on this issue to make sure any changes remain compatible with all supported CLDR versions.
Can we put this in 1.3?
Can we put this in 1.3?
I consider it blocked on https://github.com/unicode-org/icu4x/issues/3755
When reviewing #3243, I noticed that we don't seem to have different handling for
sv-u-co-standard
based on whether the data is CLDR 41 vs. CLDR 42 (or later).In CLDR 42 or later,
sv-u-co-standard
is the default. In CLDR 41,sv-u-co-standard
is the legacy collation.The simplest solution would be to remove
-u-co-standard
fromsv
unconditionally, which would make requesting the legacy collation in CLDR 41 return the default collation instead. It seems rather silly to put more engineering effort into making ICU4X trunk use the exact CLDR 41 semantics now that 42 is current and 43 is around the corner.