ideditor / schema-builder

🏗🏷 Create tagging schemas for iD
ISC License
12 stars 16 forks source link

Separate synonyms from search terms #3

Closed quincylvania closed 3 years ago

quincylvania commented 3 years ago

Re: https://github.com/openstreetmap/iD/issues/6139

Essentially the idea is that we're putting synonyms into the terms property along with other related search words. But if we knew what terms were direct synonyms, and also made them display-ready, then we could improve the preset search experience.

westnordost commented 3 years ago

You are so fast! I have a few concerns:

  1. Some names include commas. For example the job title of a craft=plumber in German is Anlagenmechaniker für Sanitär-, Heizungs- und Klimatechnik. Even when abbreviated to Gas-, Wasser- und Heizungsinstallateur (installer of gas, water and heating), it still contains commas
  2. It should somehow be made clear to translators that the first name has special relevance, as it will be shown as primary name
  3. It should maybe somehow be made clear to translators what constitutes a synonym that should be put into the name and which not, i.e. where to draw the line between a term and a synonym. This might sound redundant, but I think it is important to setup clear rules and define a clear line between these two to avoid edit wars in transifex.

An example for point 3: The primary name for amenity=hunting_stand is hunting stand (duh). Now, should tree stand (a hunting stand on/at a tree) or a hunting blind be a synonym? They are basically subtypes of a hunting stand. There are three options: No, Yes, and Only if there is no more precise way to tag it in OSM. Currently, as far as I can see, option 3 is being used whether or not to include a word in terms. Not sure if things included in name should be stricter. Maybe it should?

westnordost commented 3 years ago
  1. Maybe if it is possible notify all translators of this change. The translators will certainly want to transfer quite a few strings from terms to name
westnordost commented 3 years ago

Another thing. This is probably blue-sky. Since the order of synonyms matter and should matter (a more likely synonym should be shown before some weird but possible phrasing), there could be some way to resolve disagreement amongst translators which word is the more used one. For https://github.com/westnordost/tagdictionary I actually wrote two python scripts that would add a comment for each translated word on how many results there were for this word in a google search. This, to give a rough estimation on how well used this word is.

So maybe one step towards this kind of thing would be to not use "," to separate the synonyms but a return ("\n"). Using a "#" for example, translators could add comments to each translation.

This could be used to mark f.e. the number of results...

amenity=atm:
  - Geldautomat # found 2200k times
  - Bankomat # found 230k times

or to make comments on certain words

craft=plumber:
  - Sanitärbetrieb
  - Anlagenmechaniker für Sanitär-, Heizungs- und Klimatechnik # official but awkwardly long job title
  - Klempner # factually wrong job title (a Klempner is a tinsmith) but used in colloquial language, only in de-DE
quincylvania commented 3 years ago

So maybe one step towards this kind of thing would be to not use "," to separate the synonyms but a return ("\n").

I didn't realize you could add carriage returns on transifex, but that seems like a good idea. I'll try it.