Open bhousel opened 1 year ago
For context, relying on Wikidata labels and properties would be somewhat unconventional for an OSM-related software project compared to the more common approach of soliciting project-specific translations on a system like Transifex or Translatewiki.net. But there is some prior art, such as the highway shield legend in ZeLonewolf/openstreetmap-americana#632.
For NSI, the biggest advantage to relying on Wikidata would be reducing what would otherwise be a very significant burden on volunteer translators. Besides, most of these translations would go to waste, never seen by anyone. Moreover, Wikidata items are supposed to correspond one-for-one with NSI entries, so we’re leaving a lot of valid translations on the table at the moment. (Sometimes they don’t correspond one-to-one, but that’s a bigger problem that these labels would surface, justifiably in my opinion.)
One thing to watch out for is that Wikidata has a different naming convention for labels than we do for presets. For example, Wikidata expects labels to be capitalized only when necessary, so that a data consumer can insert “smoke tree” in a sentence instead of a more jarring “Smoke tree”. By contrast, in the default American English localization, we currently prefer title case: openstreetmap/id-tagging-schema#473. (Some other languages like French and Spanish prefer sentence case.) NSI will need to recase the labels itself to keep people from seeing the wrong case and annoying the Wikidata community with “tagging for the editor” edits, as the Americana project initially did after landing its Wikidata-powered legend.
That would mean we need to maintain a list of "what languages are commonly used in what countries/regions"? Will it bring too much breaking changes?
I’m not sure why such a list would be necessary. The build script would pull in all the labels that Wikidata has for a given operator or flag’s item, then produce a separate sidecar file for each language. It would be up to the client to choose the file appropriate to the language, similar to how interface localization works today.
I was chatting with @1ec5 about this issue of localizing the names that we use for presets. It's an issue that currently affects the flags in NSI, but would also affect some of the new categories we are considering adding, like Species (#8324) or Religions (#5960 et al)
The summary is - we currently have a Display Name property for each item in NSI, and this is used for the name of the preset that gets displayed in iD or JOSM. These strings are currently only in the language that we think the user would be using. We don't offer any localization of these strings.
It would be useful to allow users searching for a preset to be able to type other things. So we'd need some other source of data for the different names an item could be known by.
Wikidata already provides this, somewhat, because labels can be entered in many different languages, and "also known as" property is available too. There are also some properties to track common names that things are known by, like P1843.
We haven't tried to tackle localization in NSI yet, but I'm wondering whether we could just gather up all these names and languages in another sidecar file and distribute it alongside the files we already gather - so that consumers that want to be more locale-aware can use this to improve their user experience.
Open Question: Would we use these gathered names as another source of alternate
matchNames
- I dont know, maybe?Originally posted by @1ec5 in https://github.com/osmlab/name-suggestion-index/issues/8324#issuecomment-1615960848
Some examples:
Starbucks: https://www.wikidata.org/wiki/Q37158
Norway Maple: https://www.wikidata.org/wiki/Q26745