w3c / webextensions

Charter and administrivia for the WebExtensions Community Group (WECG)
Other
576 stars 50 forks source link

Align on properly formed language tags #642

Open carlosjeurissen opened 1 week ago

carlosjeurissen commented 1 week ago

Introduction

Historically, browser extensions have been using language tags with two different syntaxes. 1) Using a hyphen, I.E. en-US. This is the proper language tag format as defined in BCP47 2) Using an underscore, I.E. en_US. This is similar to POSIX, ICU and ISO/IEC 15897.

Both syntaxes have been used and supported with mixed support in different areas of the extensions. Support varies per API and browser. This WECG issue covers those areas in an attempt to come to alignment.

_locales directory

In most documentation, locale directories are supposed to use the underscore variant.

Currently this is a requirement for Chrome while Firefox seems to also support the hyphen -.

Going forward, unless there is a clear reason why underscores should be used my proposal would be to start add support for the proper BCP47 tags and disallow the use of underscores in mv4.

manifest.json default_locale

Following documentation, default_locale is supposed to be using the subdirectory name of _locales.

Currently chrome requires the use of the underscore, while Firefox supports both a hyphen and an underscore.

Going forward I suggest we keep this documentation and follow the switch to BCP47 as mentioned in _locales above.

i18n.getUILanguage

Following documentation, this returns a BCP47 language tag. Historically before version 55, Firefox used to return the underscore variation. I suggest we keep this as is.

i18n.getMessage('@@ui_locale')

This is defined as returning the current locale. In Chrome it returns the underscore variant. While in Firefox, probably since version 56, it returns the BCP47 tag.

Going forward, I suggest this to be equal to the variation used in _locales as it could be used to fetch the messages.json file.

Final words

Basically the goal is to align the behaviour across browsers. My opinion would be to always use BCP47. However this comes with a transitional cost which I believe this is worth it. Alternatively we agree for example to use the underscore only for files. However we still need to agree on default_locale and what is returned by @@ui_locale.

Related: https://github.com/w3c/webextensions/issues/131

tophf commented 1 week ago

It would break external tools like transifex that use the ISO/IEC 15897 standard for language codes (for example en_US).

patrickkettner commented 1 week ago

From Transifex's help page

When we add support for a language, we follow the BCP47 standard. The multiple language locales are based on region subtags.

tophf commented 1 week ago

From https://developers.transifex.com/docs/using-the-client#language-management

Transifex uses the ISO/IEC 15897 standard for language codes

patrickkettner commented 1 week ago

FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."

tophf commented 1 week ago

When we add support for a language, we follow the BCP47 standard.

This quote is about the internal UI/management in transifex, but I referred to the transifex tools that are specific to extension development and which process _locales directory.

FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."

Yeah, but that's a huge pain, multiplied by thousands of extensions and dozens of languages.

Either way, there may be hundreds of other utilities that use the classic _ syntax.

patrickkettner commented 1 week ago

Not arguing for or against it, just wanting to clarify that your specific point was not reflective of my use of their tool. Appreciate you taking the time to bring up issues!

tophf commented 1 week ago

Judging by Transifex example alone, using two different standards may be actually an established practice that's implemented in many such tools, i.e. BCP47 is used internally and ISO/IEC 15897 for the files.

carlosjeurissen commented 1 week ago

@ghostwords @tophf The goal here is to align this cross-browser. Updated the post to mention alternative paths we can take. We could for example always use the underscore just for file paths. However we then still need to agree on default_locale and what returned by @@ui_locale.

If tools like Transifex have some way of exporting data specific to extensions and thus use the underscore, they will likely update this if the extension system updates. However I very much see your point of the cost of switching this.

In general using multiple standards seems very counter intuitive. However if using underscores for the file paths we could simply use BCP47 with all the hyphens replaced with underscores.

The reason to align on this is also motivated by proposals like https://github.com/w3c/webextensions/pull/641.

tophf commented 1 week ago

Judging by the source code Chromium uses ICU for locales, which uses underscores. Since ICU is an industry standard, the same conventions are likely to be used by many tools for extension development.

xeenon commented 1 week ago

Safari also requires underscores for _locales.

carlosjeurissen commented 1 week ago

@ghostwords @tophf This was discussed during the 2024-06-20 meeting.

It comes down to:

As for i18n.getMessage('@@ui_locale'), Firefox returns a hyphen while Chrome returns an underscore. So in general this would not result in a breaking change as we could already not rely on this. So we either choose to use BCP47 like Firefox, or use the same format used in _locales (if feasible implementation wise). So if en_US is present while en-US is not, it will return en_US, else it would return en-US.

I would still be in favour of requiring the use of BCP47 for default_locale for manifest version 4.

tophf commented 1 week ago

What about adding a new variable:

Or maybe switch behavior based on the second parameter like getMessage('@@ui_locale', ['-'])

hanguokai commented 1 week ago

At the JavaScript API level, most Web APIs currently use BCP47. I think everyone agrees with that. Of course, my proposal #641 uses it too.

This issue mainly involves some non-API areas. All developers hate breaking changes unless there is a huge benefit. It doesn't seem like there is a huge benefit here, so I would like to keep backwards compatibility as much as possible, and @Rob--W also said in today's meeting that it is important to keep backwards compatibility.

About _locales directory and file name, I see that both Chrome doc and MDN doc are very clear. In fact, I have never heard of a developer complaining about this. So this is not a real problem for developers. In my opinion, there is no need to change it, or we can support both underline (_) and hyphen(-).

About manifest default_locale, I think we can support both. It is not a real problem for developers. It is a manifest value, I never use it in JS code.

About i18n.getUILanguage(), both Chrome and Firefox return BCP47 now, so there is no problem.

About the predefined message @@ui_locale, I never use it myself. It is inconsistent between Chrome and Firefox at present, so whether Chrome makes changes or Firefox makes changes will result in breaking changes for their current developers. To avoid breaking changes, it may be necessary to introduce a new predefined message, like @@ui_locale_hyphen and keep @@ui_locale unchanged.

carlosjeurissen commented 1 week ago

@hanguokai as mentioned here: https://github.com/w3c/webextensions/issues/642#issuecomment-2181008199, there is agreement to keep support for the underscore. I am not advocating against that.

The problem is having to deal with replacing the underscore with hyphens and vice-versa throughout your extension code and the whole supply-chain while the web mostly uses BCP47 and potential bugs because of these conversions.

As for the predefined message @@ui_locale. Instead of introducing something like @@ui_locale_hyphen, following your proposal https://github.com/w3c/webextensions/pull/641, we can introduce a @@current_locale which returns the current locale as BCP47. It seems @@current_locale can fully replace @@ui_locale use-cases.

hanguokai commented 1 week ago

@carlosjeurissen Thanks for the suggestion. @@current_locale looks good to me for i18n.getCurrentLanguage(). And @@ui_locale still represents i18n.getUILanguage().