readchina / ReadActor

A repository which contains scripts to verify authenticity of named entities in ReadAct
MIT License
2 stars 2 forks source link

Connection between OpenStreetMap and Wikidata #66

Closed whalekeykeeper closed 2 years ago

whalekeykeeper commented 2 years ago

OSM id is said to be unstable which is not a good reference. This is also the reason that they are not suggested to be added on Wikidata. Therefore, query on Wikidata will not give us OSM ids.

OSM does have a wikidata tag. Our query will receive it if available. The problem is how to get it.

In practice, none of our place have this wikidata tag.

For example, using OSM Nominatim to search by geocodings, one of our queries returns the following data :

{
    'place_id': 120583045,
    'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https: //osm.org/copyright',
    'osm_type': 'way',
    'osm_id': 59181748,
    'lat': '39.9044201260838',
    'lon': '116.4073850983882',
    'display_name': 'Dongcheng District, Beijing, 100010, China',
    'address': { 
        'city': 'Dongcheng District',
        'state': 'Beijing',
        'ISO3166-2-lvl4': 'CN-BJ',
        'postcode': '100010',
        'country': 'China',
        'country_code': 'cn' },
    'boundingbox': ['39.9043182', '39.9045426', '116.405327', '116.4098581']
}

which has a different osm_type ("way") compared with querying Beijing on OSM ("boundary").

Our current query strategy about OSM is: if the space_name string can be found in the display_name tag of OSM, we say it is a match. There is actually a risk that if it is a mismatch like Nanjing Road can be in the city Shanghai which can match either Nanjing or Shanghai. (This is a risk that I just found and I am considering how to fix it).

However, no matter if we are matching the string or if we modify the script to match the other tags like state or country, we can't link this OSM data to a wikidata id if there is no wikidata tag. Because we are not sure if it should be linked to the wikidata id of Beijing or China. (And Dongcheng District in semantic is not a city.)

I found a tool on GitHub. But it violates our idea that we try to minimize the dependencies, right?