unicode-org / icu4x

Solving i18n for client-side and resource-constrained environments.
https://icu4x.unicode.org
Other
1.39k stars 180 forks source link

Update locale canonicalization to use bcp47 alias data #746

Open dminor opened 3 years ago

dminor commented 3 years ago

In #218, we're adding locale canonicalization based upon CLDR json aliases.json data. This data is missing a handful of aliases that are defined in the bcp xml data. Once this data is added to json as tracked by #562, we'll be able to update the locale_canonicalizer to use these aliases as well.

This is blocked on both #218 and #562.

sffc commented 2 years ago

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

dminor commented 2 years ago

@dminor Do you consider this to be a 1.0 blocker? Is it required for spec compliance?

Not fixing this is a bug, but it's a pretty minor bug, the handful of missing aliases are very much edge cases. I think we can comfortably fix this post 1.0. I suggest punting it.

kartva commented 8 months ago

My understanding so far:

kartva commented 7 months ago

I've obtained the calendar.json file that seems to contain JSON data by running the download-repo-sources tool. Other bcp47 JSON files can presumably be acquired using the same process.

Next steps:

@sffc do you see anything that I might be missing?

sffc commented 7 months ago

This sounds right. I'm not sure if you should need a new AliasesV3. But yes the general idea of pulling the JSON files in with download-repo-sources and then getting them into a canonicalizer data structure is correct. Thanks!