Closed orangejulius closed 3 years ago
I think this can serve as a replacement for #41, with a bit of refinement.
It more closely follows our existing logic to use region_a
if available, then falling back to region
. It also has the logic for skipping the region name (or abbreviation) when it would duplicate the locality name.
A couple of caveats to this approach, just putting them out there in case they are easy fixes:
.startsWith()
in some form, to at least catch some easier version of this, such as removing any where the leftmost name is a prefix string of any other names.lowercase
, trim
, remove-punctuation
would help a lot, ascii_folding
would also be a cherry-on-top.Happy to punt those topics, just need to add some Issues for them.
Ok, after some testing we definitely needed some basic normalization. I borrowed some code from the API that looks perfect for the task:
Some quick examples of how this looks now: Here the well known São Paulo does not have a region in the label, but the others do.
Likewise for Berlin. Some other cities with berlin in the name in Berlin Brandenburg have an abbreviation: I'm not sure this is 100% great, we should check to see what the most common system in Germany is. Looking at Nominatim's list, the region is not included.
The classic testcase for Germany is that Frankfurt am Main and Frankfurt an der Oder produce different labels.
I think this is a much better default to have, there may be some improvements to be made on a per-country level but this raises the bar for anything using the generic
label generator :+1:
This adds the region to the default labels, but only if the region name is different from the city name (defined as locality or localadmin name).
The intent is to handle major world cities like Berlin, Sao Paulo, Paris, etc that are contained within an administrative region of the same name, and are so well known that they do not require any additional specifiers.
In the more common case where the region and city names are different, the region abbreviation is preferred, with the region name being returned only if the abbreviation is not available.