Open robertbastian opened 1 year ago
Related: there is a comment that doesn't seem to have a tracking issue in the LocaleCanonicalizer:
/// Some BCP47 canonicalization data is not part of the CLDR json package. Because
/// of this, some canonicalizations are not performed, e.g. the canonicalization of
/// `und-u-ca-islamicc` to `und-u-ca-islamic-civil`. This will be fixed in a future
/// release once the missing data has been added to the CLDR json data.
One key use case here is mapping from deprecated variants to unicode extensions, like de-PHONEBOOK
to de-u-co-phonebk
However, I'm not sure if de-u-co-standard
to de
is "canonicalization". It could be seen as minimizing likely subtags, perhaps.
Conclusion: Add a new function.
Good first bug; @zbraniecki happy to mentor.
LGTM: @zbraniecki @sffc @robertbastian
Minimizing extensions is closely related to the fallback work, #3867. Re-triaging the issue accordingly.
LocaleCanonicalizer
is only really aLanguageIdentifierCanonicalizer
at the moment, as it does not canonicalize any extensions. However, Unicode extensions for example can be canonicalized (de-u-co-standard
->de
).Canonicalizing extensions lets us avoid one lookup in fallback (i.e. the one for
de-u-co-standard
which will always fail).Discuss with:
Optional: