nodejs / i18n

The Node.js Internationalization Working Group – A Community Committee initiative.
MIT License
150 stars 40 forks source link

Key based i18n vs default language i18n #50

Closed LaurentGoderre closed 4 years ago

LaurentGoderre commented 6 years ago

There usually is two camps when it comes to manage i18n. Using a default language (usually english) or using keys (also usually in english but they are more like variables).

Has there been a decision on which one to use already?

FranzDeCopenhague commented 6 years ago

What is the context of your question? Like,

LaurentGoderre commented 6 years ago

It's less of an issue when it comes to document where there is the original language it was written but it is very relevant when it comes to translating UIs and templates.

obensource commented 6 years ago

@LaurentGoderre so far we've been moving towards managing i18n based on locale rather than jumping from a default language–much like in this example from the Electron project.

This seems to be the logical move to make for easy correlation between our Crowdin projects and i18n module repo. For any Node.js UI that imports this module–the user's locale selection will determine which translation to render, with any untranslated text gracefully falling back to English.

However, that brings up the really interesting point that I think you're getting at. Does every bit of l10n for a document or template need to be translated directly from English? I'm pretty sure we can safely assume that there are certain wordings and phrases that would be better off rewritten in another language than translated directly! It looks like we've already garnered concern for this with awesome people who care about it!

I'm currently of the opinion that for the sake of both pragmatism & empathy over time–we can do a hybrid. Translating most content directly from English will be the quickest way to achieve i18n at scale, but letting all (English) Node.js content be subject to native-l10n will be more effective for everyone.

With this–I don't see a way around having careful, case-by-case conversations with each l10n group moving forward. We'd need to provide a clear directive for all l10n groups so they can feel free to propose generating textual content in their own wording to the i18n WG. This would also probably require us to maintain a process for understanding what each native-l10n translation is communicating–and to verify it with the maintainers of the affected source before merging (eg. API docs, Node.js website, etc).

Thank you so much for bringing this up! This is a sizable oversight! 🍻

obensource commented 6 years ago

@LaurentGoderre

it is very relevant when it comes to translating UIs and templates.

Can you provide more context to this, and how it might affect us?

Thanks a bunch! 🙂

obensource commented 6 years ago

@nodejs/i18n

native-l10n is an important consideration that I think we've been overlooking^

srl295 commented 6 years ago

I think there are sets of issues here. The one in the original issue is (if I can restate it, @LaurentGoderre ) choosing one of:

Plan A: Source content is written in some language (English) without regard to translation.

Example content:

console.log('Hello World');

Example translated model:

{
   "Hello World": "Hola Mundo"
} 

Updated code:

console.log(CONVERT_TO_FOREIGN_LANGUAGE("Hello World"));

Plan B: Use keys

Example:

{
   "es":  { "greeting": "Hola Mundo" },
   "en": { "greeting": "Hello World" }
}
console.log(FETCH_THIS("greeting"));

Plan A is usually adopted because translation is a 'retrofit' and/or we don't want to bother the coders with the detail of these languages. (I use 'foreign' sarcastically here.) It has many, many problems:

Like daylight savings time/summer time, Plan A is a bad idea, but extremely popular. (yes, I have opinions here.)

Plan B usually encounters resistance because it seems complex upfront. But really, it just means thinking carefully about what you are presenting to users. Besides the bias issue of whether non-Source language people are first or second class users, you kind of have the issue of separating out the logic from the content. See for example node’s own error system https://github.com/nodejs/node/pull/11220 — instead of people comparing the strings of error messages, the keys become a natural way to work with the errors semantically.


The second set of issues has to do with English vs. Non English. In my mind, the source language (which is why i say source language instead of English whereever possible) is a development team decision. The 'default' language doesn't have to be English, and it sometimes isn't in practice even if the source language is so. Let's say a French national company, the default language in absence of other information is likely to be … French. Ideally tools and processes are designed to NOT hardcode English or treat it specially. Yes I have been known to hardcode ('en') instead of writing (SOURCE_LANGUAGE) but I try. The actual language used is a process and development issue. For Node.js content itself, it may be a reasonable decision to make English the working source language. However, just as with Wikipedia which has no 'root' language, we should anticipate the scenario where let's say the Spanish version of some document gets a major overhaul and is so improved that the wording should be translated to other languages. This sort of thing could happen if someone contributes improvements to a 'translated' language.

Just my ¤2.00 (<<< Substitute your appropriate currency here)

LaurentGoderre commented 6 years ago

Wow! thanks @srl295 for explaining so well the rationale I have been putting off writing!

zeke commented 6 years ago

On the Electron project, we use a combination of approaches.

For localized strings on Electron's website, we use a locale.yml file with arbitrarily named keys. These keys are referred to in the website's HTML templates like {{localized.nav.apps}}. There's currently no science to the key names; they're typically just a pared down version of the English string itself.

The rest of the Electron website's translated content lives in markdown files from the electron/electron repo. We send these documents to Crowdin in their entirety. We initially tried to avoid accidental translation of untranslatable strings like Array and BrowserWindow by programmatically extracting all the translatable strings and putting them in a YML file, but translators had difficulty translating these strings without knowing their surrounding context. Hence, the whole markdown file now goes into Crowdin.

Because whole markdown files are now exposed to translators, there is more of a risk of certain content being translated that should be really left alone, code snippets being the most prominent example. We've taken a few measures to help avoid confusion about what should and shouldn't be translated:

  1. Created a Crowdin glossary of terms like Javascript builtins and Electron API names that should not be translated. These appear as tooltips in Crowdin's translation interface.
  2. Promoted many translators to proofreader roles to help disseminate information to other translators in their language.
obensource commented 6 years ago

@zeke @LaurentGoderre @srl295 thanks for your terrific insights! 🙌

Promoted many translators to proofreader roles to help disseminate information to other translators in their language.

Maintaining a close relationship with individuals in designated proofreader roles for each l10n team is going to be necessary in order to help us isolate which strings need their own source-localization. The concept of 'proofreader' & 'translator' roles is something we should probably bake into our new l10n guidelines so we can get the ball rolling with that.

Created a Crowdin glossary of terms like Javascript builtins and Electron API names that should not be translated. These appear as tooltips in Crowdin's translation interface.

Maybe we can run a script in the CI that injects warnings for translators as comments above any string that contains reserved terms in the markdown files before they migrate to Crowdin. That way we wouldn't have to maintain a glossary to reference while translating in Crowdin, but only our own list of terms that shouldn't be translated.

eg. > Please do not translate the reserved term(s): Array, BrowserWindow

For localized strings on Electron's website, we use a locale.yml file with arbitrarily named keys.

This seems like a straightforward way to do this, and maybe we can extend it with a locale-key based approach similar to @srl295's Plan B.

eg. localized.nav.es.apps

It might also be beneficial to make this a meta-process that will cover multiple i18n initiatives (eg. API docs, Web Site, etc). We might opt for adding a project key as well.

eg. localized.website.nav.es.apps

The actual language used is a process and development issue. For Node.js content itself, it may be a reasonable decision to make English the working source language.

Pragmatically, I think that making English the working source language is going to help us achieve i18n at scale the quickest, given that the alternative (though more ideal) may possibly require refactoring efforts in existing Node.js source to accommodate the kind of templating needed to support an absence of a 'root' language (please correct me if I'm wrong). If we were able to determine somehow that we wouldn't be asking very much of core maintainers & other initiatives, it might be a rad way to go. Granted though, Node.js has a lot of contributors so it might not actually be that painful.

we should anticipate the scenario where let's say the Spanish version of some document gets a major overhaul and is so improved that the wording should be translated to other languages.

Yes! 👍We'll need to rely on our l10n teams to inform us of when they think their versions are better than the original source. Here are a couple ways I think we could handle this:

oppianmatt commented 5 years ago

Have you considered using something called "Engineering English"? It's the best of both worlds (keys and English strings).

Concept is that you define a default language called "Engineering English" where the engineers write the key using english to convey the meaning. It reads well. But to avoid changing keys that change the the string, those keys are fixed. If you want to change the english translation you override it in the translation file.

For you might have:

{
   "es":  { "Hello World": "Hola Mundo" },
   "en": { "Hello World": "Hello World" }
}

but then you want it to say "Greetings Earth" instead so you would do this:

{
   "es":  { "Hello World": "Hola Mundo" },
   "en": { "Hello World": "Greetings Earth" }
}

You keep the key the same since it still is the same concept to an engineer but the output is different.

zeke commented 5 years ago

I like that idea, @oppianmatt. But I would favor using key strings that lend themselves to easy addressability in JS and in templating languages. Spaces make things tricky, so instead of Hello World, I would prefer hello_world.

alexandrtovmach commented 4 years ago

This issue should be closed, because we're not using any of this approaches.

c5n8 commented 1 year ago

@alexandrtovmach what approach do you use in the end?