Open matkoniecz opened 2 years ago
How do you intend to do that? In default language "en" It is already found:
Only in the language "en_GB" it is not found, because it was translated as "Car Park".
How do you intend to do that?
Add an alias like in say https://github.com/openstreetmap/id-tagging-schema/pull/335
I guess this should probably be fixed in iD's code directly because this is likely an issue with all presets whose name has been translated in any (non US) dialect of English: Any preset name could also be automatically be included in the search term for all English locales. This would be the only "good" solution if you ask me, because otherwise we would potentially need to include every preset's name
in their respective list of terms
(see the readme section for context).
Such systematic idea is likely superior than doing it with all terms as it is being spotted. Though I am not going to promise implementing this wider scope.
Well, another idea here: Instead of always also matching against the en-US default localization, instead match against the tag value. E.g. if you type "parking", it will also find amenity=parking
regardless of the user's locale because that's the OSM tag value of the preset.
Would it be possible to add this English dialect aliases to output files?
Note that if such aliasing will be done on iD side it would require StreetComplete and GoMap!! and Every Door by Zverik and so on to implement the same mechanism.
While implementing it in build script would allow to add it once and have it applied to all other users.
For example see https://github.com/westnordost/osmfeatures/issues/13 which appears to be result of the same issue.
alcohol shop
finds shop=alcohol
in en-GB and fails to find it in en-US
https://github.com/openstreetmap/id-tagging-schema/blob/main/dist/translations/en-GB.json#L3453 https://github.com/openstreetmap/id-tagging-schema/blob/main/dist/translations/en.json#L7137
Build script seems to be in https://github.com/ideditor/schema-builder/blob/main/lib/build.js - and I could try implementing this as part of improving StreetComplete (and improve iD, GoMap!!, Every Door and maybe also other editors).
Would it be possible to add this English dialect aliases to output files? [鈥 and I could try implementing this
That sounds like a good idea. :+1: Your contribution would be very welcome.
@matkoniecz That does not make sense in my opinion (I am the author of osmfeatures library):
The primary reason being that en.json
is actually en-US
, i.e. there is no en-US.json
. Mateusz idea premises on that the translations are organized in a form like
pt
- Portuguese translations: contains common translationspt-PT
- Portuguese translations (Portugal dialect)pt-BR
- Portuguese translations (Brazil dialect)If the translations were organized in that manner, merging pt
into pt-PT
for distribution would make sense. But the translations are not organized in that manner. We have
pt
- all Portuguese translations (implicitly Portugal dialect)pt-BR
- all Portuguese translations (Brazil dialect)The same for other languages that have significant dialects in different countries, i.e. en
. So, merging together pt
and pt-BR
will then just include the Brazilian words into the localization for Portugal. E.g. highway=bus_stop
will both be named "Paragem de autocarro" (correct) and "Ponto de 么nibus" (but that's Brazilian Portuguese).
In the end, whether to fall back or even merge (in)to another locale should remain a client-side decision. I.e. if you decide that merging en-US into en-GB may not be the cleanest solution, but nevertheless it improves things (may want to consult with British users though), then it is a decision you should make for iD and not for any user of this preset data.
My premise is that if term is used for something in one dialect of English, then it will be an useful alias in any other dialect of English.
I am aware that en
is actually en-US
Note that it would be only alias: not something shown as a label and only mattering when someone used this term on their own.
Lets take fictional example and say than in EN-AU
name for parking lot is "foobar". Is it useful to show parking lot when user searched for "foobar" while using EN-GB
?
How likely is that (1) alias would be used also in other dialects, maybe less commonly (2) someone would be mixing multiple dialects and use terms from different ones at once (especially common with people learning English as a foreign language)?
How likely is that alias would result in actively misleading/confusing/unwanted matches? Like #237?
Summoning @Zelonewolf (hope that it is OK) as I am NOT a native speaker of English.
E.g. highway=bus_stop will both be named "Paragem de autocarro" (correct) and "Ponto de 么nibus" (but that's Brazilian Portuguese).
I am not proposing that. I am proposing that it would be named "Paragem de autocarro". But findable also if someone would type "Ponto de 么nibus" (or "Ponto" or "么nibus"), as "Ponto de 么nibus" would become listed as an alias.
So, your idea is to (in example of en
and en-GB
,en-AU
...):
en
+en-AU
,... name
of preset to en-GB
aliases
(except duplicates)en
+en-AU
,... aliases
of preset to en-GB
aliases
(except duplicates)en
+en-AU
,... terms
of preset to en-GB
terms
(except duplicates)And then probably same the other way round, i.e. merge en-GB
into en(-US)
etc.
How likely is that alias would result in actively misleading/confusing/unwanted matches? Like https://github.com/openstreetmap/id-tagging-schema/issues/237?
I don't know, but this is the reason why I argued that it should be a client-side decision.
Also, note that aliases
are not implemented in iD yet.
I would propose to only do the following:
en
name of preset to en-*
terms (except duplicates)en
aliases of preset to en-*
terms (except duplicates)I don't really have an opinion on how the language dialects are structured. There are certainly examples of words that mean one thing in en_GB and something else in en_US, For example en_GB "chips" means en_US "french fries" and en_US "chips" means en_GB "crisps". I'm not sure how many of these cases would apply to OSM features, however...
pavement comes to mind. That's the usually paved walk for pedestrians at the side of the street in British English. And in American English, it is the main part of the street that has pavement, i.e. often everything except the sidewalk.
If that were an issue in practice, we would need to rethink how we handle preset terms in English: Currently, all English dialects share the same list of search terms
(see readme). Yes, this will result in a search for pavement
to show both the preset for highway=sidewalk
and area:highway=*
[^1]. But users would always get shown the corresponding (more "precise") preset name (and preset icon) in the results list, letting them choose what they actually intended to search for.
IMHO that is fine and working as intended. But maybe I'm overlooking something that could be problematic?
[^1]: luckly in this example, the presets only apply to different geometry types, so the theoretically possible situation that the search shows both results does not actually happen here.
Hm well, I just want to advert at the other possible solution that would solve the use case as described as well: https://github.com/openstreetmap/id-tagging-schema/issues/461#issuecomment-1134045651 This would be a feature in iD of course, not in this schema.
match against the tag value.
iD already does support this (see https://github.com/openstreetmap/iD/issues/8869#issuecomment-1004306198).
While this helps in some cases (e.g. when searching for parking), in this particular case (searching for parking lot) it doesn't do the trick.
I agree with using overlapping terms/aliases for all other English dialects if you have one locale selected. Not everyone I know even speaks the same dialect of English so sometimes I forget which ones might be locale specific.
The tag names themselves are also already a mix of dialects
pavement comes to mind. That's the usually paved walk for pedestrians at the side of the street in British English. And in American English, it is the main part of the street that has pavement, i.e. often everything except the sidewalk.
In en_US, pavement refers to any paved area. The street has pavement, the sidewalk has pavement, etc.
Parking lots are pavement too. 馃槈
If either the build scripts or individual clients like iD mix terms from different dialects, these additional terms should be weighted much less than terms from the current dialect.
Also, there would already be use cases for language-specific tweaks to the preset search algorithm: https://github.com/openstreetmap/iD/issues/8242#issuecomment-742220871. Maybe any change related to this request could be scoped to just English for now, where the impact would be better understood.
Currently, all English dialects share the same list of search
terms
(see readme).
If this is the case, are the terms in the non-American English localizations ignored when using those locales? Or do you mean that the non-American English localizations don鈥檛 have many terms
translated yet?
https://github.com/openstreetmap/id-tagging-schema/blob/1cbef5186feb17f599bd5df04aa99d581b0cb5b1/dist/translations/en-AU.json#L138-L141 https://github.com/openstreetmap/id-tagging-schema/blob/1cbef5186feb17f599bd5df04aa99d581b0cb5b1/dist/translations/en-GB.json#L1863-L1865 https://github.com/openstreetmap/id-tagging-schema/blob/1cbef5186feb17f599bd5df04aa99d581b0cb5b1/dist/translations/en-NZ.json#L275-L278
(I may implement it, after https://github.com/openstreetmap/id-tagging-schema/pull/337 is reviewed/rejected/merged)