unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.36k stars 174 forks source link

Ensure that the provider performs correct alias mapping for Traditional Chinese locales #1964

Open hsivonen opened 2 years ago

hsivonen commented 2 years ago

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

hsivonen commented 2 years ago

CC @sffc

hsivonen commented 2 years ago

For clarity: CLDR maps yue-CN and yue-Hans to zh-Hans, i.e. zh-u-co-pinyin.

sffc commented 2 years ago

Ensure that if a specific (and existing) collation hasn't been specified with -u-co-, the following map to zh-u-co-stroke:

  • zh-Hant regardless of region.

This will be possible so long as zh-Hant contains the correct data. I'll add a test for this.

  • zh without Hans but with any of HK, MO, TW.

This should be automatic given that these fallbacks are included in parent locales / likely subtags; all of these locales will fall back via zh-Hant.

  • yue without either Hans or CN.

Looks like the mappings in likely subtags are correct:

      "yue": "yue-Hant-HK",
      "yue-CN": "yue-Hans-CN",
      "yue-Hans": "yue-Hans-CN",

I'll add a test for it.

sffc commented 2 years ago

There is a list of collation-specific aliases/parents in the LDML-to-ICU converter:

https://github.com/unicode-org/icu/blob/0266970e977b9e2488dfbf788cc280be3a0338ca/tools/cldr/cldr-to-icu/build-icu-data.xml#L263

Obviously, that list isn't making it into ICU4X.

I chatted with @markusicu about this today. He says that it may make sense to introduce a "processing" mode to the locale fallback engine. This mode can be used for both collator and break iterator.

I need to verify whether the set of ICU-specific overrides should apply uniformly to both collator data and segmenter data.

sffc commented 2 years ago

I still need to implement the actual zigzag fallback, but this can be done in the Collation fallback mode.

sffc commented 1 year ago

Upstream issue involving the ICU-specific fallback aliases: https://unicode-org.atlassian.net/browse/CLDR-16253

sffc commented 5 months ago

See some more recent discussion in #3867