unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 178 forks source link

Handle sv-u-co-standard properly #3253

Open hsivonen opened 1 year ago

hsivonen commented 1 year ago

When reviewing #3243, I noticed that we don't seem to have different handling for sv-u-co-standard based on whether the data is CLDR 41 vs. CLDR 42 (or later).

In CLDR 42 or later, sv-u-co-standard is the default. In CLDR 41, sv-u-co-standard is the legacy collation.

The simplest solution would be to remove -u-co-standard from sv unconditionally, which would make requesting the legacy collation in CLDR 41 return the default collation instead. It seems rather silly to put more engineering effort into making ICU4X trunk use the exact CLDR 41 semantics now that 42 is current and 43 is around the corner.

robertbastian commented 1 year ago

Can we normalize this in datagen?

hsivonen commented 1 year ago

Not sure. It's seems rather silly to put a lot of effort into this. In theory, we should handle CLDR 41 data already generated. In practice, CLDR 41 probably no longer matters at all for ICU4X.

I have a really hard time believing that there is a real constituency who a) wants to run ICU4X trunk code with CLDR 41 data and b) wants the legacy collation for Swedish.

That's why I think it should be enough to unconditionally remove -u-co-standard from sv even if the result is theoretically not exactly correct for CLDR 41.

sffc commented 1 year ago

The default collation should be stored at sv, and the non-default collation foo should be stored at sv-u-co-foo. During data lookup, a request for sv will get the default collation, and if a specific collation bar is requested, first sv-u-co-bar will be tried, and if it is not there, then lookup will fall back to sv.

hsivonen commented 1 year ago

I guess then the current code gives the right outcome for Swedish with both CLDR 41 and CLDR 42, but at a tiny perf hit for 42 and later.

sffc commented 1 year ago

Note: I'd like it if we could be testing CLDR 41 and 42 while working on this issue to make sure any changes remain compatible with all supported CLDR versions.

robertbastian commented 1 year ago

Can we put this in 1.3?

sffc commented 1 year ago

Can we put this in 1.3?

I consider it blocked on https://github.com/unicode-org/icu4x/issues/3755