w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
600 stars 56 forks source link

`i18n.getMessage()` language fallback paths #296

Open carlosjeurissen opened 2 years ago

carlosjeurissen commented 2 years ago

Not all browsers handle language fallbacks the same. Considering the following situation:

An extension is using the native i18n APIs with "default_locale": "en" in manifest.json, and three messages.json files in the languages en, pt and pt-BR.

Both en and pt include the message ids message1 and message2. While pt-BR includes only message1.

In the above situation, browsers handle fetching i18n.getMessage('message2') different.

Chromium first checks pt_BR/messages.json, if the message is not present, it checks pt/messages.json, and finally, if the message is still not found, it will check the default_locale, in this case en/messages.json. In the above situation, this means it gets the message2 value from pt.

In Firefox, however, the browser first checks pt_BR/messages.json. If the message is not in this file, it will directly fallback to default_locale. so it checks en/messages.json. Resulting in message2 value becomes the one from en.

Interestingly enough, in Firefox, if pt_BR/messages.json is not present in general, it will check pt/messages.json first, before checking en/messages.json.

What is the behaviour we want in these cases?

hanguokai commented 2 years ago

I support Chrome's behavior, which seems more reasonable. Usually, developers want language-region locale to fallback to language locale first, then the default locale.

If the browser wants to support multiple different behaviors at the same time, I recommend add a new property in the 3rd parameter(options) in this api.

carlosjeurissen commented 2 years ago

@hanguokai generally speaking this can be useful to reduce the overall package size.

However, there are cases when the fallback might not always be welcome. Say your zh/messages.json is in Simplified Chinese script, and zh_TW is in Traditional Chinese. Would it be great if a different script is used as fallback? Same can happen with other languages with multiple scripts, like Serbian (Latin and Cyrillic).

hanguokai commented 2 years ago

I know the difference, it's better than the default(complete another language like English). For example,

en: "Software"
zh: "软件"
zh_TW: "軟體"

The difference between "软件" and "軟體" is smaller than that of English.

For the best user experience, developers need to supply full message map(1:1) if they are different. Only when they are the same or acceptable, they can be omitted.

carlosjeurissen commented 2 years ago

@hanguokai relying on good developer behaviour can be tricky. I can imagine there are Chinese people knowing only English and either Traditional / Simplified Chinese? I could be wrong?

hanguokai commented 2 years ago

Do you know how many people only understand Chinese(zh-CN and/or zh-TW) but not English? Of course, there are real examples in every situation(combinations).

I said in my previous post:

If the browser wants to support multiple different behaviors at the same time, I recommend add a new property in the 3rd parameter(options) in this api.

hanguokai commented 2 years ago

There are multiple possible strategies. Another possible fallback strategy is following navigator.languages order. For example:

If navigator.languages is ['zh-TW', 'en'], then the search order is zh-TW -> en -> extension default locale.

xeenon commented 2 years ago

I believe Safari matches Chrome here after looking at the code.

carlosjeurissen commented 2 years ago

Reached out to the ltli w3c group here: https://github.com/w3c/ltli/issues/35.

Safari currently matches the behaviour of Chrome. If from above discussion is concluded this the preferred process, Firefox will follow.

carlosjeurissen commented 6 months ago

Quick update, @aphillips mentioned two potential fallback algorithms. One being a simple progressive removal of subtags. And the other being the more advanced algorithm from the Unicode's CLDR used in ICU. See: https://github.com/w3c/ltli/issues/35#issuecomment-1295168890

@xeenon @oliverdunk Do you know which algorithm is used in Safari and Chrome? Based on this we can figure out what algorithm should be used in Firefox considering the lack of any fallback algorithm in Firefox (Except to the default_locale).

xeenon commented 6 months ago

@carlosjeurissen Safari removes subtags, which we coded to match Chrome.

oliverdunk commented 6 months ago

I had a brief look through the code and Chrome appears to remove subtags as @xeenon suspected 👍

Rob--W commented 1 month ago

Some of us (@dotproto, @Rob--W, @oliverdunk, @carlosjeurissen) met with the I18n group (@aphillips, @eemeli and others) and discussed the topic of whether to fall back (partial minutes). Chrome and Safari already have the same behavior of falling back from specific language tags to less specific ones, ultimately to default_locale. Firefox is supportive of implementing the same, and there was already a feature request at https://bugzilla.mozilla.org/show_bug.cgi?id=1381580.

Arguments in favor of the multiple fallback include the ability to have smaller message.json files, e.g. generic English + small en-US and en-GB specific files.

erosman commented 1 month ago

@Rob--W Since the fallback process is being updated, can the following https://github.com/w3c/webextensions/issues/258#issuecomment-2280230511 be relevant as it suggests an additional step in the fallback chain?

Rob--W commented 1 month ago

@Rob--W Since the fallback process is being updated, can the following #258 (comment) be relevant as it suggests an additional step in the fallback chain?

I don't see the relevance of that other issue. The issue here is about unifying the fallback behavior across browsers (basically for Firefox to match Chrome and Safari). What you are proposing is an additional step, but the referenced comment mentions a feature request that has not been adopted by any browser.

carlosjeurissen commented 1 month ago

@Rob--W I believe @erosman is trying to say once this language fallback logic has been improved in firefox, it is more valuable to extension authors to have a way to make use of the fallback logic using getMessage with a specific locale tag or using some form of setLanguage() versus just loading message.json files directly.

birtles commented 1 month ago

@carlosjeurissen Safari removes subtags, which we coded to match Chrome.

@xeenon is this to say it doesn't even try looking up subtags? Because that's what several people are reporting.

xeenon commented 4 weeks ago

@birtles I'll take a look. We do look for the sub-tags first, but there might be a bug somewhere.

xeenon commented 4 weeks ago

@birtles I'm not seeing any issues with Safari's locale fallback in Safari 18. We use zh_CN and zh_TW for Simplified Chinese and Traditional Chinese on Apple platforms. Your change to rename zh_hans to zh_CN is correct for Safari (and seems fine for Chrome and Firefox).

birtles commented 4 weeks ago

@birtles I'm not seeing any issues with Safari's locale fallback in Safari 18. We use zh_CN and zh_TW for Simplified Chinese and Traditional Chinese on Apple platforms. Your change to rename zh_hans to zh_CN is correct for Safari (and seems fine for Chrome and Firefox).

Thank you so much for looking into this. I'll follow up in the issue you kindly commented on since I'm not quiet yet able to get this working in Safari 18.

birtles commented 3 weeks ago

I filed Chromium issue 375528194 for the fact that Chrome doesn't seem to recognize zh_hans, only zh_CN.

hanguokai commented 3 weeks ago

zh-CN and zh-TW are language code + region code. zh-Hans and zh-Hant are language code + script code. zh-Hans-CN, zh-Hans-SG, zh-Hant-HK and zh-Hant-TW are language code + script code + region code.

However, due to historical reasons, some operating systems, browsers and other softwares still use or support only zh-CN rather than zh-Hans. In #641 , we also discussed it (See link-1, link-2).

xeenon commented 3 weeks ago

After my change in https://commits.webkit.org/285633@main, Safari will support script codes in _locales — including three part locale identifiers.

We have always reported the script (if used) in i18n locale APIs as well.

carlosjeurissen commented 3 days ago

Firefox patch can be found here: https://phabricator.services.mozilla.com/D224084