westnordost / osmfeatures

A dictionary of OSM map features, accessible by terms and by tags, for Java and Android.
Apache License 2.0
17 stars 9 forks source link

Fallback for untranslated feature names #26

Closed logan12358 closed 3 months ago

logan12358 commented 3 months ago

Some feature translations don't include a name (e.g. en-NZ). In loadLocalizedFeatures these would completely override features from previous languages, with names as an empty array, causing an exception in the Feature.name getter. This can crash the StreetComplete app in getNameAndLocationLabel.

This change uses fallback names from previously loaded languages (i.e. null, "en", before "en-NZ").

As a separate change it might be reasonable to fall back for untranslated terms too, but I'm not confident about ramifications of that.

I'm pretty new to OSM and Kotlin - let me know if I can change anything here or if I'd be better to make an issue and discuss first.

logan12358 commented 3 months ago

I just tried updating the tests and it raised the question of whether language should also be changed when performing this name fallback - perhaps it should, but I don't know the exact uses of language downstream...

westnordost commented 3 months ago

Hm, curious... I didn't know that was possible.

I think more changes are necessary if we cannot assume that any given preset is either fully translated (names + aliases + terms) or not translated. The expected behavior, I guess, would be to merge all of the mentioned with the previous language component (i.e. merge en into en-NZ).

Your implementation looks slick and should work but tests should be added to test the expected behavior described above. For example, if the en-NZ translation has a name but no terms, but the en translation has terms, then the terms of the en translation should be used. Same with aliases.

I just tried updating the tests and it raised the question of whether language should also be changed when performing this name fallback - perhaps it should, but I don't know the exact uses of language downstream...

Hm no, I don't think so. After all, this language-merging is only done for dialects of the same language. If a translation is only partly translated, it's language is then still specific to en-NZ.

westnordost commented 3 months ago

And also, thank you for finding and reporting this issue and also for supplying an implementation that already looks quite good, to me this seems to be the exact correct spot to handle this case.

logan12358 commented 3 months ago

Thanks for reviewing :)

I've added the fallback logic for terms too.

It seems like aliases are translated as multiple lines along with the name, so there shouldn't be any instances of aliases without name also set. This is true experimentally in the assets of StreetComplete. In the iD editor schema builder, see this message added to the name translation field and this logic when fetching translations.

For tests, I've updated the existing case merging de-AT and de to include "name but no terms" and "terms but no name". This test no longer covers the case of using a whole feature from de where it is not present in de-AT, but that is exercised elsewhere (e.g. "getting features through fallback chain") so hopefully acceptable. I also don't know any nice German terms, so feel free to update my placeholders.

westnordost commented 3 months ago

It seems like aliases are translated as multiple lines along with the name, so there shouldn't be any instances of aliases without name also set.

This is not entirely correct. In the translation files, they are separate, e.g. the wilderness hut looks like this:

"tourism/wilderness_hut": {
  "name": "Wilderness Hut",
  "aliases": "Backcountry Hut\nBackcountry Shelter\nBothy",
  "terms": "alpine hut,bivouac,bivouac box,biwakschachtel,cabin,lodge,lodging,overnight accommodations,refuge,sleeping shelter"
},

It is only in the data model, i.e. the Feature where name and aliases are conflated to names. This is done here:

https://github.com/westnordost/osmfeatures/blob/b51f189d01b58d63ad0b4f9363ffe20e792100bd/src/commonMain/kotlin/de/westnordost/osmfeatures/IDPresetsTranslationJsonParser.kt#L57-L61

Past that point, in IDLocalizedFeatureCollection it is indeed not really possible anymore to tell them apart. Hm, maybe at least internally, aliases should persist as an internal field?

Alternatively, there is one thing we could do without changing the structure of the data classes: if the names only has one entry, add the aliases from the fallback feature. So, if for en-NZ would have just e.g.

"tourism/wilderness_hut": {
  "name": "Kiwi Hut",
},

it would add "Backcountry Hut" etc. from the en translation.

logan12358 commented 3 months ago

Hmm, tricky :thinking: In a translation file it should be impossible to define aliases without having defined name. But like you say it does seem like aliases can fall back when only names is translated. I found an example of this in shop/convenience between en and en-NZ and confirmed the behaviour (aliases does fall back) in the editor. This behaviour doesn't seem especially intentional - some languages might have names without good aliases, but not want to fall back to the base presets.json?

I've pushed a change which uses the strategy you suggest, pulling aliases out of names. I think it's correct, but it feels rather messy. The LocalizedFeature structure seems like the next thing to look at - at the moment it can exist in a bad state (with empty names, where getting name fails). But I see there's a bit of logic handling "placeholder" names too.

`shop/convenience` example, for reference From `en.json`: ``` "shop/convenience": { "name": "Convenience Store", "aliases": "Convenience Shop\nCorner Store\nCorner Shop\nSuperette\nBodega", "terms": "mini-mart,mini-market" }, ``` From `en-NZ.json`: ``` "shop/convenience": { "name": "Convenience Store / Dairy", "terms": "dairy,superette,convenience store,convenience shop,corner store,corner shop" }, ``` In iD editor, I can search for this by typing "bodega" (fallback aliases) and "dairy" (localized terms), but not by typing "mini-mart" (not in localized terms).
westnordost commented 3 months ago

Hmm, tricky 🤔 In a translation file it should be impossible to define aliases without having defined name

Oh? I didn't look at how the translation is presented in Weblate. You mean, in Weblate, name and aliases are not separated? Hm, then I suppose the previous behavior would be just as fair. But now you changed it, and that's fine, too.

westnordost commented 3 months ago

Hmm, what about if all terms and names are merged into a dialect from the base language (maybe removing duplicates)? Would that make sense? - that could maybe actually be a nice feature...

westnordost commented 3 months ago

Anyway, maybe I will change it like that, but first, I will merge your PR

westnordost commented 3 months ago

I contemplated it a bit, and then implemented the above. I think the more terms and aliases can be used to find a preset, the better. The example you showed for example where it differed - this simply resulted from that the en-NZ was not updated after aliases were introduced. So, when we just merge en into en-NZ, users will still profit from up-to-date en translations even if en-NZ is a outdated or does not contain all (maybe recently) added terms.

As a result, the code is also more straightforward.

logan12358 commented 3 months ago

Thanks for merging, and your following change looks simpler :)

In general it seems reasonable to merge aliases and terms from all system/preference locales. I see that StreetComplete has this behaviour to always include null (presets) - these might not make sense to merge for non-English speakers when a translation is available? But it doesn't seem like terms or aliases are used much in StreetComplete so it might not be a practical issue there.

westnordost commented 3 months ago

Unlocalized features are presets from the NSI, i.e. brands.

logan12358 commented 3 months ago

Ohh right, makes sense! Thanks for all your time with this