Closed 1ec5 closed 4 months ago
Localisation is managed through translatewiki and we don't have any control over what locales they provide to us,
OK, so en.yml should be in American English, not mostly British English?
I don't think we really have a clear answer to that... It probably should unless we're going to get TW to remap things in a different way but the problem is that the people doing the merges are mostly en-GB speakers who won't notice british spellings.
The en-GB translation is mostly annoying to be honest because it's full of strings which were copied even though they don't actually change which means I often wind up wondering why I'm not seeing a change and it's because I'm seeing the en-GB copy of the string.
Previous discussion in #3653 and #3671.
I don't think we really have a clear answer to that... It probably should unless we're going to get TW to remap things in a different way but the problem is that the people doing the merges are mostly en-GB speakers who won't notice british spellings.
If this project is open to using en.yml for American English, I’d be happy to proofread it thoroughly, move any Britishisms to en-GB.yml, and monitor changes to en.yml going forward. I’ve been doing the same thing for years for id-tagging-schema, which has the same challenge. There’s also a #language channel in OSMUS Slack where developers can request proofreading or a second opinion about terminology. StreetComplete’s developers take advantage of this channel on a regular basis.
The en-GB translation is mostly annoying to be honest because it's full of strings which were copied even though they don't actually change which means I often wind up wondering why I'm not seeing a change and it's because I'm seeing the en-GB copy of the string.
Could en-GB.yml be a sparse translation, just the strings that we need to override en.yml? On Translatewiki.net, some projects periodically bulk delete translations that are identical to the source. Then I think the site would fall back to en.yml automatically. If not, config/i18n-tasks.yml could probably backfill the missing translations from en.yml.
Incidentally, you might appreciate this mismatched en-GB string I just fixed in iD (not yet released):
American English | British English |
---|---|
OK, so en.yml should be in American English, not mostly British English?
Yes, I think it should be. It's similar to the discussion that we had with portuguese where we had to decide if requests for "pt" should be answered with "pt-BR" or "pt-PT" and went with the CLDR recommendation of pt-BR. And I strongly suspect that the equivalent for "en" will be "en-US". Therefore:
Language | Locales | Files |
---|---|---|
American English | en, en-US | en.yml |
British English | en-GB | en-GB.yml |
So to make this work, all we have to do is to make sure the source translations in en.yml are "en-US" and everything will work as intended (apart from some Brits who might need to update their browsers preferred languages, if they haven't done so already).
But this codebase also maintains translations for a number of feature tags that would be much less intuitive to an American English speaker.
It's worth noting that many of these feature translation strings are written by non-English speakers, often native German speakers, so it's hard for them to spot any Britishisms, particularly if the British term is more similar to either the tag (traditionally OSM uses en-GB for tag values) and/or if the term is more similar to those used in other European languages.
The other non-tag translation strings in the repo are, as @tomhughes says, more often written by en-GB speakers.
Also "unsquare" is not a word ;-)
The en-GB translation is mostly annoying to be honest because it's full of strings which were copied even though they don't actually change which means I often wind up wondering why I'm not seeing a change and it's because I'm seeing the en-GB copy of the string.
I mostly find this annoying when developing, but I remind myself that it would be the same if I was a native French developer and was trying to update the en.yml sources and not seeing the updates.
The main problems with sparse translations on Translatewiki include the minimum percentage of translated strings, before the translation becomes valid. For en-GB I think the threshold might not be met if only the strings that differ are counted. But in any case, if the translators only translated the ones that differed, they'd face a constant list of thousands of "untranslated" strings, and have to pick through them over and over, searching for any that genuinely differed. So I don't blame them for copy+pasting where they match.
As far as I can see there's only three problems for en-GB speakers:
So to make this work, all we have to do is to make sure the source translations in en.yml are "en-US" and everything will work as intended
We should aim to have the only spelling differences between en-US and en-GB. If phrase or term is specific to one country or the other the odds are it will cause confusion to English-speaking people somewhere. Phrases or terms that will be understood by all English speakers are more easily understood by everyone.
That’s a reasonable request, considering the lack of an en-CA, en-NZ, en-IN, etc. so far. However, I don’t know that it’s possible to totally avoid dialectal terminology. Some terms like chemist versus drugstore will be confusing and surprising to someone either way; I don’t think there’s a suitable international alternative. The most we can do is to avoid gratuitously regional turns of phrase, especially legal terms that are tied to a particular jurisdiction regardless of the dialect spoken.
If there were some other place to put an American English localization, we could do whatever we want with the main English one, but unfortunately that isn’t the current situation.
As far as I can tell, config/locales/en.yml is de facto written in Commonwealth English, with Britishisms such as “centre”, “colour”, “licence”, and “shop” generally prevailing over Americanisms such as “center”, “color”, “license”, and “store”. There’s also an en-GB.yml for even more classically British turns of phrase, but no en-US.yml for even basic quirks of American English. A user who knows American English will most likely configure their browser and account to prefer the en-US locale code, but they’ll see Commonwealth English instead.
For the most part, this is no problem: “colour” and “licence” merely look quirky, maybe slightly erroneous to someone unfamiliar with the dialect. But this codebase also maintains translations for a number of feature tags that would be much less intuitive to an American English speaker.
Not a single American knows what these are, even though “license” is spelled the American way:
https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L1275
You’d have to be an OpenHistoricalMap contributor like me to know what these are:
https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L749
99.9 out of 100 Americans would tell you this is scientific laboratory. iD calls it a “drugstore”, which is apparently closer to the intended use of this tag:
https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L1297
For sure, Americans know what these are, but
amenity=toilets
isn’t supposed to refer to a lavatory fixture specifically, and it’s a bit impolite:https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L2387
Americans mostly use “shop” and “store” interchangeably, and so does this repository:
https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L1301-L1307 https://github.com/openstreetmap/openstreetmap-website/blob/63f0b9257d62818d435ef6510340b9681efe547b/config/locales/en.yml#L1384-L1390
One way or another, we should provide more regionally appropriate terms to American English speakers, to reduce confusion about what these tags mean. I don’t have an opinion on whether we should split out a separate en-US.yml or have the Americanisms take over en.yml, since either way we’d be more likely to load the expected strings when the user says they prefer en-US in their account settings.