osmlab / osm-planning

General OSM tools planning and wishlist
Other
18 stars 1 forks source link

Translation of tags in OSM #20

Open bhousel opened 6 years ago

bhousel commented 6 years ago

I am proposing...

To translate all the tags.. (ok not all the tags, but at least the frequently used ones) Really! Not a joke!

What will this enable that we can't do today?

OSM tags are specified in English, but this makes contributing to OSM difficult for non English speakers.
We frequently get requests in iD for improved translation of the tag lists https://github.com/openstreetmap/iD/issues/2708#issuecomment-370112193

So for example, the surface=* tag currently shows values which are fetched from taginfo:

screenshot 2018-05-22 09 39 58

These values are not translated because the list is open-ended, and in fact there are over 6000 values for surface=* in OSM today: https://taginfo.openstreetmap.org/keys/surface#values Most of these probably do not need to be translated - many are probably mistakes or should be retagged as something else.

Challenges

We've held off on implementing this for a few reasons:

  1. The volume of things to translate is really big. Many thousands, depending on where we want to cut-off the limit of usefulness. (We shouldn't translate all 6000 surface values, but maybe the top 50? 100? Which tags should we include? Everything that's a preset or field in iD?)

  2. Displaying a dropdown with a mix of English and localized strings is a hard UI problem. We'd need to make it very clear that the values should be typed in English, but allow users to view and search local values also. (@magols' suggestion here is very good https://github.com/openstreetmap/iD/issues/2708#issuecomment-376444797 )

  3. Where would this live? Ideal solution would be for it to be part of taginfo, so that any request for a list could also include localized strings where available. CC @joto for thoughts (feasability? cost?).

  4. Integration with Transifex? I'm imagining a script that loops through all the preset fields in iD with certain ones marked for translation ("surface", "access") or no-translation ("name", "ref", anything yes/no), and fetches the top values from taginfo by popularity, and assembles a big bunch of source strings to push to transifex as a new resource.

Whom does this benefit?

Anyone using OSM in a language other than English.

Any drawbacks?

This is a big lift.

Resources needed to build and maintain?

This could run as a new database and service, or possibly as part of Taginfo.

See also:

https://github.com/openstreetmap/iD/issues/2708#issuecomment-370112193

Any ideas for a project name?

tag babelfish

Describe this project in a single emoji:

🗿

joto commented 6 years ago

Translating the tag key/value strings just by the numbers without having humans in the loop who understand the meaning and use of the tags is probably going to create a huge amount of confusion. Often a translation of just the tag will be difficult to impossible. Keeping the tags as is and translating the descriptions seems to me to be the better approach. And we have a large head start there in the wiki already.

And there will be many other problems. For instance, chances are you will have some translations of "wrong" tags that sound better or more fitting in whatever language you are translating to than the translations of the "right" tag. ("Right" and "wrong" here in the meaning of what's commonly used and accepted in OSM.)

Taginfo's job is to pull all information together on all the tags. It's job is not to create or hold new information. So editing these translations etc. is not taginfos job. If and when those translations are available somewhere it becomes taginfo's job to integrate this information and at that point we can talk about how to do this best.

imagico commented 6 years ago

I would like to emphasize what @joto also implied - there is no such thing as a single word translation of tags in the vast majority of cases. The tags do not always mean what the tag value means in English either (like leisure=park is not for everything called a park in English and landuse=forest is not for everything called a forest - see national parks/national forests for example).

What comes closest to what you would like to have here is the short description from the tag pages on the wiki - which is also prominently shown by taginfo. This is meant to contain a brief description of what the tag means - in any language there is a tag page on the wiki for.

For a related discussion on the practicability of single word descriptions of tags see:

https://wiki.openstreetmap.org/wiki/User_talk:Geozeisig#Please.2C_stop_removing_descriptions

bhousel commented 6 years ago

from @joto:

Translating the tag key/value strings just by the numbers without having humans in the loop who understand the meaning and use of the tags is probably going to create a huge amount of confusion.

Oh yeah, in case it weren't clear - there would definitely be humans in the loop. There are many people who want this and are willing to do the work on a site like Transifex. (we're not considering a machine translation).

Taginfo's job is to pull all information together on all the tags. It's job is not to create or hold new information. So editing these translations etc. is not taginfos job. If and when those translations are available somewhere it becomes taginfo's job to integrate this information and at that point we can talk about how to do this best.

Sounds good - that is totally fair. I described one possible approach (loop through iD presets and fields and query taginfo for each one) to build the source strings which we'd then push to our existing volunteers. I'd like to do a quick prototype just to determine the scope of how many strings we'd really be dealing with. It might be a ridiculous number.

from @imagico

I would like to emphasize what @joto also implied - there is no such thing as a single word translation of tags in the vast majority of cases. The tags do not always mean what the tag value means in English either (like leisure=park is not for everything called a park in English and landuse=forest is not for everything called a forest - see national parks/national forests for example).

Yep, again we'd be relying on humans to provide a best-fit translation. We know that the tags are imperfect, but at least this effort would advance the problem of having non-English speaking mappers tagging things like "surface" from being "nearly impossible" to "possible but just as imperfect as English-speaking mappers"

1ec5 commented 6 years ago

It would be helpful to get a sense of how many translatable strings we’re talking about. Translation cost is a real hurdle to clear – by this point, there are about a dozen competing projects on Transifex, Translatewiki.net, and elsewhere for translating preset names and tag values, usually as part of editor translation projects.

Translation memory would be the primary benefit of using a translation platform (Transifex, Translatewiki.net, et al.) over continuing to rely solely on the wiki for translated tag descriptions. Unfortunately, translation memory doesn’t carry over easily from one translation platform to another or even between different Transifex projects. So ideally this would be the one tag translation project that supersedes the others, rather than yet another project that translators have to complete – and keep consistent.

simonpoole commented 6 years ago

Just saw this .... and have to say this is essentially a non issue, both JOSM and Vespucci (and naturally the wiki) have had translations for the more popular values for "ages" (and translation memory would pop them automatically in to iD if so wished).

Obviously there -is- an issue with automatically retrieving tag values from taginfo in such an scenario, but that can't really be resolved without essentially -not- retrieving things automatically.

bhousel commented 6 years ago

Just saw this .... and have to say this is essentially a non issue, both JOSM and Vespucci (and naturally the wiki) have had translations for the more popular values for "ages" (and translation memory would pop them automatically in to iD if so wished).

Can you share a link to whatever project it is that you think solves this problem?

You have a database somewhere that knows that surface=paved in Hindi is सतह=पक्का (my best guess). I really think you're not understanding the scope of what this issue is about.

simonpoole commented 6 years ago

Naturally directly translating the tag values doesn't make sense because in many cases the EN values are slightly adventurous to start with and verbatim translations wouldn't help at all.

JOSM presets have the notion of a " display value" that can be shown in place of the actual Osm 'internal' value. Translating these makes a lot of sense (translation context will be needed now and then, that can be added in Josm presets).

todrobbins commented 5 years ago

@bhousel any update on this proposal? I saw @quincylvania's newest comment about a UI for such translation work/tagging, but would love to hear more of your current thoughts.