tc39 / proposal-intl-locale-info

An API to expose information of locale, such as week data ( first day of a week, weekend start, weekend end), hour cycle, measurement system, commonly used calendar, etc.
MIT License
56 stars 11 forks source link

CharacterDirectionOfLocale doesn't take script and region subtags into account #53

Closed anba closed 2 years ago

anba commented 3 years ago

CharacterDirectionOfLocale as currently written gives the impression that script and region subtags can be ignored and instead there should only be a lookup for characterOrder from UTS 35 layout elements.

This can lead to wrong results, for example:

  1. Locale is "az-Arab":
    1. There is no explicit locale for "az-Arab" in https://github.com/unicode-org/cldr/blob/main/common/main/, so the lookup for characterOrder defaults to its parent locale "az".
    2. "az" has no explicit characterOrder definition, so it defaults to its parent locale, the root locale.
    3. The root locale definition for characterOrder is "left-to-right".
    4. But the expected result is "right-to-left", because the script is Arabic.
  2. Locale is "az-IR":
    1. Similar to "az-Arab", there's also no explicit "az-IR" locale in https://github.com/unicode-org/cldr/blob/main/common/main/.
    2. But adding likely subtags to "az-IR" gives the locale "az-Arab-IR".
    3. That means the expected result is again "right-to-left".

cc @jfkthame and @zbraniecki

FrankYFTang commented 2 years ago

The algorithm currently states: "If the default general ordering of characters (characterOrder) within a line in locale is right-to-left, return "rtl"."

There are no text which imply a particular locale fallback mechanism here. Conceputally, the characterOrder of az-Arab or az-IR is simply "right-to-left" because that is the "default general ordering of characters (characterOrder) "