thomersch / openstreetmap-calendar

osmcal, a Collaborative Calendar for OpenStreetMap-related Events
https://osmcal.org
Apache License 2.0
33 stars 9 forks source link

Duplicate Countries in Filter #25

Open thomersch opened 4 years ago

thomersch commented 4 years ago

In the location filter dropdown, the US is listed twice: as "United States" and "United States of America":

Screenshot 2020-03-16 at 20 37 25

We just take display all the available countries in the database in alphabetical order. Those values are coming from Nominatim, thus being highly dependent on consistent tagging in OSM. We should consider whether to implement a synonym list or trying to clean up values somehow.

thomersch commented 3 years ago

I cleaned up the database entry, but the issue might arise in the future again. Since the filter now supports ISO codes (e.g. DE, US etc), we could use some fixed list.

danieldegroot2 commented 2 years ago

@thomersch The issue is currently present. I've also noted

"Czech Republic", "Czechia" image

"DR Congo", "Democratic Republic of the Congo" image

"TW", "Taiwan" image

thomersch commented 2 years ago

Yeah, this will pop up again and again, since we're just using OSM data.

I looked whether we could use ISO 3166 lists (e.g. https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv), but I haven't found a consistent one. This one for example says "Moldova, Republic of" and "United Kingdom of Great Britain and Northern Ireland", but most countries are addressed by their shorthand name, e.g. "Germany".

I'd rather also not display just two letter ISO codes, but have proper labels.

Maybe we could get a different data source, like Wikidata or Natural Earth. We don't even need geometries, just sane labels would suffice. If someone could come up with e.g. a Wikidata SPARQL query, this would make me very happy (or at least, less sad).

danieldegroot2 commented 2 years ago

In the location filter dropdown, the US is listed twice: as "United States" and "United States of America":

Screenshot 2020-03-16 at 20 37 25

We just take display all the available countries in the database in alphabetical order. Those values are coming from Nominatim, thus being highly dependent on consistent tagging in OSM. We should consider whether to implement a synonym list or trying to clean up values somehow.

Question

What is being used? (an OSM key?)

Example

https://www.openstreetmap.org/relation/148838

United States

United States of America

Other examples

have not checked other instances of similar name (read: island/islet, boundary etc.)

Czechia

https://www.openstreetmap.org/relation/51684 int_name, name:en

DR Congo

https://www.openstreetmap.org/relation/192795 short_name:en

TW

https://www.openstreetmap.org/relation/449220 short_name, short_name:en

Solution(s)

thomersch commented 2 years ago

Currently, osmcal uses the country attribute from Nominatim. (Geocoding code, Display code)

The information about the SQL from Nominatim file seems to be outdated. They've since moved to a different system, but that seems to be quite reasonable. It's a list of all countries, even with translations: https://github.com/osm-search/Nominatim/blob/master/settings/country_settings.yaml

Now we just need a system to get this into osmcal.

grischard commented 1 year ago

There are also countries that have no ISO code, like Kosovo, which uses the temporary code XK.