opentraveldata / opentraveldata

Collection of open data related to (at least) travel, transport, tourism
https://opentraveldata.github.io/opentraveldata/
236 stars 101 forks source link

Report wrong coordinates #149

Open MrMey opened 4 years ago

MrMey commented 4 years ago

Hello Denis, I made a small script to shoot with openstreetmap content with OSMPythonTools and openstreet map API (Overpass) to compare coordinates for airports of optd_por_public.csv.

I got 117 cases where OPTD did not match OSM position (>50km distance). Out of the 57/117 were real misplaced locations - the rest are false positives (either wrong coordinates in OSM or wrong matching to my query). Due to the false positives, it requires a manual check afterwards.

I manually checked all the data provided on google map + satellite view (I attached only the cases where I think optd location was misplaced) and in three cases I manually searched the coordinates because none of the source was correct (see column manually_found to get the coordinates).

For a large number of cases, it was about minor airports in remote countries of Oceania/Africa. For some of the cases, it is possible that not only the coordinates are misplaced (like MEZ airport which is Messina city according to IATA)

I applied it only to airport because iata_code helps to match it - it could be tried on railway station for example. Maybe such pre-filtering approach to detect errors could be used for your quality checks. The overpass query I used is : 'way["iata"="%s"]; out center;' % iata_code

Due to the low rate limit on overpass API, I had to spread the shoots over several hours unfortunately, I think that there are other overpass endpoints where the rate limit is higher or optimize the query to reduce the number of shoot.

prototype python code ``` import time from neobase import NeoBase from OSMPythonTools.overpass import Overpass overpass = Overpass() neo = NeoBase() # OVERPASS shoot config MAX_RETRY = 4 SLEEP_TIME = 5 # really low pace to comply with the api for iata_code in neo: if iata_code.count('@'): # ignores duplicates continue if 'A' not in neo.get(iata_code, 'location_type'): # only airports continue results = None for trial in range(MAX_RETRY + 1): try: time.sleep(SLEEP_TIME) results = overpass.query('way["iata"="%s"]; out center;' % iata_code) break except Exception as exc: print(exc) continue if results is None: continue for result in results.elements(): optd_loc = [float(neo.get(iata_code, x)) for x in ('lat', 'lng')] osm_loc = [result._json.get('center').get(x) for x in ('lat', 'lon')] dist = neo.distance_between_locations(optd_loc, osm_loc) if dist and dist > 50: print('Found very large dist %f for %s : %s || %s' % (dist, iata_code, optd_loc, osm_loc)) ```
data | distance \(km\) | airport | optd\_coord | osm\_coord | update\_optd | manually found | |-----------------|---------|----------------------------|-------------------------------|--------------|-----------------------------| | 18010\.562796 | BTO | "4\.21802,\-55\.44645" | "\-6\.2886864,106\.5685817" | yes | | | 58\.843918 | BXD | "\-7\.70525,139\.5622" | "\-7\.1764607,139\.5831213" | yes | | | 279\.24155 | BXM | "\-2\.283,139\.6" | "\-4\.442606,140\.8839377" | yes | | | 212\.641929 | CGG | "14\.58,121" | "16\.1936474,122\.0644348" | yes | | | 1145\.669537 | CSU | "\-22\.93235,\-43\.719092" | "\-29\.6839511,\-52\.4115908" | yes | | | 58\.398572 | CTR | "\-17\.58,131" | "\-17\.6079283,131\.5501843" | yes | | | 504\.841267 | DRA | "40\.97,\-117\.7" | "36\.6196418,\-116\.0321632" | yes | | | 1247\.38296 | DUA | "44\.63,\-91\.97" | "33\.9426415,\-96\.3946912" | yes | | | 15960\.156961 | ELO | "\-26\.4,\-54\.63" | "\-1\.5603659,149\.630109" | yes | "\-26\.392993,\-54\.575262" | | 79\.23196 | ELR | "\-3\.817,140\.1" | "\-3\.7818401,139\.3867502" | yes | | | 135\.12711 | GKT | "35\.85,\-82\.03" | "35\.8584483,\-83\.5293153" | yes | | | 491\.591718 | IAQ | "27\.212678,54\.318592" | "29\.8407323,50\.2714218" | yes | | | 7938\.085113 | ILM | "34\.270615,\-77\.902569" | "\-34\.7069477,\-58\.2454661" | yes | | | 1090\.857556 | KBJ | "\-14\.48,132\.3" | "\-24\.2605755,131\.4895144" | yes | | | 54\.558319 | KGB | "\-6\.133,147\.7" | "\-6\.2242461,147\.2150875" | yes | | | 681\.920492 | KMR | "\-6\.5,144\.8" | "\-6\.185144,138\.6376667" | yes | "\-6\.493373,144\.825586" | | 321\.909875 | KPA | "\-5\.583,145\.4" | "\-5\.388981,142\.4982081" | yes | | | 637\.309909 | KRJ | "\-8\.75,147\.5" | "\-4\.5967016,143\.5224804" | yes | | | 203\.870515 | LII | "\-3\.167,136\.2" | "\-3\.7014263,137\.9569957" | yes | | | 403\.682291 | LWA | "10\.3,123\.9" | "6\.6729527,124\.0577908" | yes | | | 13705\.983321 | MBA | "\-4\.034833,39\.59425" | "58\.9319745,\-158\.9030987" | yes | | | 255\.011745 | MBF | "\-35\.28,149\.1" | "\-36\.7179674,146\.8915131" | yes | | | 16167\.023312 | MDN | "38\.73,\-85\.37" | "\-7\.6152516,111\.4454493" | yes | "33\.612325,\-83\.461808" | | 486\.200886 | MEZ | "\-25\.7045,26\.909" | "\-22\.3561955,29\.9886738" | yes | | | 1504\.780704 | MIO | "33\.4,\-110\.9" | "36\.9090872,\-94\.8907038" | yes | | | 92\.594728 | MPT | "\-8\.167,125" | "\-8\.9722579,125\.2145043" | yes | | | 2065\.776454 | MQZ | "\-18\.62,126\.9" | "\-33\.9307137,115\.0996865" | yes | | | 134\.269449 | MRL | "\-20\.07,145\.6" | "\-18\.9964228,145\.013493" | yes | | | 1789\.162988 | MXA | "40\.98,\-109\.7" | "35\.8945285,\-90\.1555198" | yes | | | 1331\.42393 | MZE | "27\.64272,\-82\.52164" | "17\.2788442,\-89\.0246437" | yes | | | 155\.860006 | MZJ | "32\.83,\-109\.7" | "32\.5150086,\-111\.322591" | yes | | | 94\.206496 | NBL | "9\.45361,\-78\.97867" | "9\.273124,\-78\.1397202" | yes | | | 604\.731309 | NCR | "8\.483,\-79\.95" | "11\.135788,\-84\.7685574" | yes | | | 152\.103781 | OHA | "\-41\.5,174\.8" | "\-40\.2057206,175\.3853408" | yes | | | 82\.87878 | OXD | "39\.5,\-85\.75" | "39\.5026635,\-84\.7840382" | yes | | | 447\.715044 | PCH | "15,\-89" | "15\.9548829,\-84\.941163" | yes | | | 14977\.867949 | PTW | "40\.23,\-75\.65" | "\-4\.2838544,134\.9741972" | yes | | | 86\.935996 | PYS | "39\.71,\-120\.6" | "39\.7109416,\-121\.6163204" | yes | | | 135\.402936 | QMM | "42\.77,10\.23" | "43\.9861021,10\.1439847" | yes | | | 73\.497497 | RBE | "13\.07,107" | "13\.7307797,106\.9833206" | yes | | | 76\.685091 | RHA | "65\.33,\-20\.58" | "65\.4517076,\-22\.210163" | yes | | | 864\.088474 | RUG | "32\.38,129\.7" | "32\.2571332,120\.5029109" | yes | | | 12738\.851653 | SKO | "12\.916322,5\.207189" | "\-2\.2717037,119\.8983344" | yes | | | 61\.900445 | SQR | "\-2\.567,120\.8" | "\-2\.5296437,121\.3559791" | yes | | | 3873\.07486 | SSD | "1\.9162,\-67\.07793" | "\-32\.7459666,\-70\.704639" | yes | | | 84\.667071 | SWG | "\-5\.883,148" | "\-6\.1394854,147\.2791046" | yes | | | 822\.677545 | TBQ | "\-2\.833,152" | "\-6\.4661655,145\.5325815" | yes | | | 56\.258871 | TGS | "\-24\.62,32\.42" | "\-24\.5240405,32\.9662325" | yes | | | 58\.030314 | THB | "\-29\.87,28\.17" | "\-29\.5218835,28\.617595" | yes | | | 2586\.756327 | TQE | "41\.76332,\-96\.17807" | "18\.6489619,\-99\.2617133" | yes | | | 189\.323432 | UAL | "\-10\.78,20\.5" | "\-10\.7152087,22\.2317741" | yes | | | 494\.963527 | WKI | "\-17\.83,31\.17" | "\-18\.3625295,26\.520536" | yes | | | 175\.279871 | XEN | "40\.73,118\.6" | "40\.5003798,120\.654482" | yes | | | 76\.83774 | YGC | "53\.23,\-119" | "53\.9169292,\-118\.8735793" | yes | | | 86\.141528 | ZBE | "49\.88,16\.87" | "49\.9290591,18\.0704146" | yes | | | 1188\.63443 | ZLR | "\-46\.55,\-71\.7" | "\-35\.8609629,\-71\.5475871" | yes | |

Bests Romain

da115115 commented 4 years ago

Excellent, love it, thanks!