Separate Language from Locale

zbraniecki commented 5 years ago

The current proposal specifies ofLanguage as accepting either a language or a full language tag (language-script-region).

I'm not sure why this decisions has been made but it seems a bit counter-intuitive and an outlier.

Why not ofLocale for whole locale, and ofLanguage just for languages?

FrankYFTang commented 5 years ago

This is actually two different issues but let me explain here

Why NOT to include a method for "language only"? With my web application development experiences (google translate web UI, and Chrome Translation integration UI, and others), often time the web developer need to let the user to pick 1)"Traditional Chinese" (zh-Hant), "Simplified Chinese" (zh or zh-Hans), "Portugal Portuguese" (pt-PT or just pt), "Brazilian Portuguese" (pt-BR), "Latin American Spanish" (es-419), "Spain Spanish" (es-ES or just es), in the same list with 2) "Japanese" (ja), "Korean" (ko), "Hindi" (hi), or "Persian" (fa). So from developer coding point of view, a "language only" API, which can only handle 2) but not 1) is almost always a wrong code since it won't be able to handle 1). So I believe NOT offering a "language only" method in the API will help the developer not to chose the wrong method to call (if he/she only need to support en, fr, de, it in the first version but later will extend to support zh-Hant, zh, es-419, pt-BR, pt-PT, which later he/she will need change the code again and may cause some other issue in the backend.
Why NOT to support an "any kind of locale" method. In the other hand, I also think if we include a method to support ANY KIND of Unicode Locale Identifier it could open up a big can of worm to make the implementation to support issues unnecessary. I am not saying we should never go there, but at this stage I think that could be over engineer the API at version 1. It is rather different to support the display names of zh, zh-Hant, zh-Hant-TW, zh-TW than to support the display names of zh-Hant-TW-u-ca-islamic-co-zhuyin-nu-Arab. And in reality this kind of usage is theoretical possible with very low probability of real usage. So I think we should at least in the version 1 to restrict not to support such method which could be way too complex to implement and test and focusing on most common usage cases first. If we can make it wildly being adopt, we can consider for version 2 for those.
In the other hand, I do think we may, during our Stage 1 discussion, extends the v1 API to support method to get display name back for calendar names (such as the input as "islamic"), collation names (such asinput as "zhuyin", or other unicode extension VALUE for a particular key. (say "kn", "kf", "nu") , one by one, individually.

zbraniecki commented 5 years ago

Hmm, I'm torn. I see your point, but I've been raised by Dave Hermann's "don't design the API to teach developers a lesson" ;)

I think I consider locale display name to be the same as you propose as language. You can pass it zh to get Chinese or zh-Hans to get Simplified Chinese.

This should be the primary API for those operations and fits the use case you described with a drop-down list of locales to select from (where ja and zh-Hans are next to each other).

I'd like to point out here that I do not believe developers do, or should, pull "all available locales" into their drop down out of an Intl API. Intl API should never be used as a comprehensive list of supported locale codes.

On the other hand I see use cases, other than selection, where ability to get zh as Chinese and es as Spanish makes sense. For example in our text2speech or voice systems we may want to communicate that the system will use Chinese. That's a language. We don't care about region or script for that particular use case.

Unicode separates the concept of locale and language/region/script and I would prefer not to confuse them. If you're not convinced by my arguments, I would at the very list suggest using the term Locale for what you now call Language, since zh-Hans is not a language, but it is a locale. This would live us a gate to later consider adding the language as I described.

sffc commented 5 years ago

Hmm. How about .ofWrittenLanguage() for language-script and .ofSpokenLanguage() for language only?

FrankYFTang commented 5 years ago

On the other hand I see use cases, other than selection, where ability to get zh as Chinese and es as Spanish makes sense. For example in our text2speech or voice systems we may want to communicate that the system will use Chinese. That's a language. We don't care about region or script for that particular use case.

sure. but in that case, would your text2speech list also show en-GB, en-AU, en-IN, en-SG for a text speech which pronounce with British, Australia, India, and Singapore accents? or ar-EG for Egyptian Arabic accents?

FrankYFTang commented 5 years ago

Unicode separates the concept of locale and language/region/script and I would prefer not to confuse them. If you're not convinced by my arguments, I would at the very list suggest using the term Locale for what you now call Language, since zh-Hans is not a language, but it is a locale. This would live us a gate to later consider adding the language as I described.

Notice in UTS35 http://unicode.org/reports/tr35/#Unicode_language_identifier

unicode_language_id = "root" | (unicode_language_subtag (sep unicode_script_subtag)? | unicode_script_subtag) (sep unicode_region_subtag)? (sep unicode_variant_subtag)* ;

FrankYFTang commented 5 years ago

I would at the very list suggest using the term Locale for what you now call Language, since zh-Hans is not a language, but it is a locale.

The problem is "zh-Hant-TW-u-ca-islamic-co-zhuyin-nu-Arab" is ALSO a Locale, but I believe we should not support it in this API (at least for now). So if we call this Locale, instead of Language, then the caller will expect we can handle "zh-Hant-TW-u-ca-islamic-co-zhuyin-nu-Arab".

I understand what you said. The real issue is we lack of a term for LSRV part (Language-Script-Region-Variant) part of the Locale. It is neither Language nor Locale but I don't have the word for it. Neither Language nor Locale is the right way to call it.

FrankYFTang commented 5 years ago

On the other hand I see use cases, other than selection, where ability to get zh as Chinese and es as Spanish makes sense.

This API currently does not prevent you from such calls.

littledan commented 5 years ago

I see your point, but I've been raised by Dave Hermann's "don't design the API to teach developers a lesson" ;)

I am confused; adding additional separations seems like a way to teach developers a lesson.

zbraniecki commented 5 years ago

I understand what you said. The real issue is we lack of a term for LSRV part (Language-Script-Region-Variant) part of the Locale.

Agree, the way I think about it is that we should support "subset" of what a full BCP47 locale is. I would also expect that we'd accept a full language tag (or Intl.Locale object) and skip the parts we don't handle (so, your zh-Hant-TW-u-ca-islamic-co-zhuyin-nu-Arab would be treated the same as zh-Hant-TW).

I am confused; adding additional separations seems like a way to teach developers a lesson.

Maybe I misrepresent myself. The API should not try to workaround our worry that the user will not understand how to use it. It feels to me like Frank's attempt to interpret a full(er) locale in an API about languages is motivated by his belief that users misunderstand what they want. I prefer not to hide this complexity but rather expose it and teach people the difference between language and locale (and that they usually want a locale).

zbraniecki commented 5 years ago

This API currently does not prevent you from such calls.

True. Maybe we'd want to verify that we have a good way to filter it out. Something like:

function getLanguageNames(localeList) {
  let languageCodes = localeList.map(locale => new Intl.Locale(locale, {script: null, region: null}));
  let dnFormatter = new Intl.DisplayNames(navigator.locales, {type: "language"});
  return dnFormatter.of(languageCodes);
}

let languageNames = getLanguageNames(["zh-Hans-TW", "pt-PT", "fr-ZH"]);

Looking at it, I'd still prefer:

function getLanguageNames(localeList) {
  let dnFormatter = new Intl.DisplayNames(navigator.locales, {type: "language"});
  return dnFormatter.of(localeList);
}

function getLocaleNames(localeList) {
  let dnFormatter = new Intl.DisplayNames(navigator.locales, {type: "locale"});
  return dnFormatter.of(localeList);
}

let languageNames = getLanguageNames(["zh-Hans-TW", "pt-PT", "fr-ZH"]); // ["Chinese", "Portugese", "French"]
let localeNames = getLocaleNames(["zh-Hans-TW", "pt-PT", "fr-ZH"]); // ["Chinese (Simplified, Taiwan)", "Portugese (Portugal)", "French (Swiss)"]

FrankYFTang commented 5 years ago

@zbraniecki Do you think this issue is settled? Could I close this issue?

zbraniecki commented 5 years ago

Hmm, I don't think it is, but I'm not going to block on that :)

I'm curious if others have opinions on the naming here, so I'd like to raise it again in the next meeting for people to look into.

FrankYFTang commented 5 years ago

ok, this issue was opened based on the old spec (ofLanguage). Since we change the spec to the new style (just an of method) I will close this issue. If you have concern about the naming in the options, or the expectation of the syntax of the code, please file a separate issue.

tc39 / proposal-intl-displaynames

Separate Language from Locale #12