Open carlosjeurissen opened 1 week ago
It would break external tools like transifex that use the ISO/IEC 15897 standard for language codes (for example en_US).
When we add support for a language, we follow the BCP47 standard. The multiple language locales are based on region subtags.
From https://developers.transifex.com/docs/using-the-client#language-management
Transifex uses the ISO/IEC 15897 standard for language codes
FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."
When we add support for a language, we follow the BCP47 standard.
This quote is about the internal UI/management in transifex, but I referred to the transifex tools that are specific to extension development and which process _locales directory.
FWIW, the sentence directly following your quote is "If you use a different format for the local language codes, you can define a mapping in your configuration file .tx/config."
Yeah, but that's a huge pain, multiplied by thousands of extensions and dozens of languages.
Either way, there may be hundreds of other utilities that use the classic _ syntax.
Not arguing for or against it, just wanting to clarify that your specific point was not reflective of my use of their tool. Appreciate you taking the time to bring up issues!
Judging by Transifex example alone, using two different standards may be actually an established practice that's implemented in many such tools, i.e. BCP47 is used internally and ISO/IEC 15897 for the files.
@ghostwords @tophf The goal here is to align this cross-browser. Updated the post to mention alternative paths we can take. We could for example always use the underscore just for file paths. However we then still need to agree on default_locale
and what returned by @@ui_locale
.
If tools like Transifex have some way of exporting data specific to extensions and thus use the underscore, they will likely update this if the extension system updates. However I very much see your point of the cost of switching this.
In general using multiple standards seems very counter intuitive. However if using underscores for the file paths we could simply use BCP47 with all the hyphens replaced with underscores.
The reason to align on this is also motivated by proposals like https://github.com/w3c/webextensions/pull/641.
Judging by the source code Chromium uses ICU for locales, which uses underscores. Since ICU is an industry standard, the same conventions are likely to be used by many tools for extension development.
Safari also requires underscores for _locales
.
@ghostwords @tophf This was discussed during the 2024-06-20 meeting.
It comes down to:
As for i18n.getMessage('@@ui_locale')
, Firefox returns a hyphen while Chrome returns an underscore. So in general this would not result in a breaking change as we could already not rely on this. So we either choose to use BCP47 like Firefox, or use the same format used in _locales
(if feasible implementation wise). So if en_US
is present while en-US
is not, it will return en_US
, else it would return en-US
.
I would still be in favour of requiring the use of BCP47 for default_locale
for manifest version 4.
What about adding a new variable:
@@ui_locale_hyphen
@@ui_locale_web
@@ui-locale
- not an identifier strictly speaking, but it's kinda self-explanatory once you know the difference (con: it's confusing if you don't), so it might be worth making an exception.Or maybe switch behavior based on the second parameter like getMessage('@@ui_locale', ['-'])
At the JavaScript API level, most Web APIs currently use BCP47. I think everyone agrees with that. Of course, my proposal #641 uses it too.
This issue mainly involves some non-API areas. All developers hate breaking changes unless there is a huge benefit. It doesn't seem like there is a huge benefit here, so I would like to keep backwards compatibility as much as possible, and @Rob--W also said in today's meeting that it is important to keep backwards compatibility.
About _locales
directory and file name, I see that both Chrome doc and MDN doc are very clear. In fact, I have never heard of a developer complaining about this. So this is not a real problem for developers. In my opinion, there is no need to change it, or we can support both underline (_
) and hyphen(-
).
About manifest default_locale
, I think we can support both. It is not a real problem for developers. It is a manifest value, I never use it in JS code.
About i18n.getUILanguage()
, both Chrome and Firefox return BCP47 now, so there is no problem.
About the predefined message @@ui_locale
, I never use it myself. It is inconsistent between Chrome and Firefox at present, so whether Chrome makes changes or Firefox makes changes will result in breaking changes for their current developers. To avoid breaking changes, it may be necessary to introduce a new predefined message, like @@ui_locale_hyphen
and keep @@ui_locale
unchanged.
@hanguokai as mentioned here: https://github.com/w3c/webextensions/issues/642#issuecomment-2181008199, there is agreement to keep support for the underscore. I am not advocating against that.
The problem is having to deal with replacing the underscore with hyphens and vice-versa throughout your extension code and the whole supply-chain while the web mostly uses BCP47 and potential bugs because of these conversions.
As for the predefined message @@ui_locale
. Instead of introducing something like @@ui_locale_hyphen
, following your proposal https://github.com/w3c/webextensions/pull/641, we can introduce a @@current_locale
which returns the current locale as BCP47. It seems @@current_locale
can fully replace @@ui_locale
use-cases.
@carlosjeurissen Thanks for the suggestion. @@current_locale
looks good to me for i18n.getCurrentLanguage()
. And @@ui_locale
still represents i18n.getUILanguage()
.
Introduction
Historically, browser extensions have been using language tags with two different syntaxes. 1) Using a hyphen, I.E.
en-US
. This is the proper language tag format as defined in BCP47 2) Using an underscore, I.E.en_US
. This is similar to POSIX, ICU and ISO/IEC 15897.Both syntaxes have been used and supported with mixed support in different areas of the extensions. Support varies per API and browser. This WECG issue covers those areas in an attempt to come to alignment.
_locales
directoryIn most documentation, locale directories are supposed to use the underscore variant.
Currently this is a requirement for Chrome while Firefox seems to also support the hyphen
-
.Going forward, unless there is a clear reason why underscores should be used my proposal would be to start add support for the proper BCP47 tags and disallow the use of underscores in mv4.
manifest.json
default_locale
Following documentation,
default_locale
is supposed to be using the subdirectory name of_locales
.Currently chrome requires the use of the underscore, while Firefox supports both a hyphen and an underscore.
Going forward I suggest we keep this documentation and follow the switch to BCP47 as mentioned in
_locales
above.i18n.getUILanguage
Following documentation, this returns a BCP47 language tag. Historically before version 55, Firefox used to return the underscore variation. I suggest we keep this as is.
i18n.getMessage('@@ui_locale')
This is defined as returning the current locale. In Chrome it returns the underscore variant. While in Firefox, probably since version 56, it returns the BCP47 tag.
Going forward, I suggest this to be equal to the variation used in
_locales
as it could be used to fetch themessages.json
file.Final words
Basically the goal is to align the behaviour across browsers. My opinion would be to always use BCP47. However this comes with a transitional cost which I believe this is worth it. Alternatively we agree for example to use the underscore only for files. However we still need to agree on
default_locale
and what is returned by@@ui_locale
.Related: https://github.com/w3c/webextensions/issues/131