Open nvkelso opened 6 years ago
tagging myself ... @stepps00
we can merge the data ( ne.ne_10m_populated_places - wof ) via
wof_id
or wikidataid
imho: first - we need a little data cleaning in the curent - ne_10m_populated_places. wof_id
| ne_id = 1159151703 | wof_id = 101915059 ( this record has been superseded ) | wikidataid = Q37100
the Q37100 is pointing to -> wof_locality | 101914257
whosonfirst=#
whosonfirst=# select metatable, id, parent_id, placetype_id, wd_id
whosonfirst-# from wf.wof
whosonfirst-# where wd_id='Q37100'
whosonfirst-# ;
+--------------+-----------+-----------+--------------+--------+
| metatable | id | parent_id | placetype_id | wd_id |
+--------------+-----------+-----------+--------------+--------+
| wof_locality | 101914257 | 102079403 | 102312317 | Q37100 |
+--------------+-----------+-----------+--------------+--------+
whosonfirst=#
whosonfirst=# SELECT wof.metatable
whosonfirst-# ,count(*) as N
whosonfirst-# FROM ne.ne_10m_populated_places AS ne_pp
whosonfirst-# LEFT JOIN wf.wof AS wof ON ne_pp.wof_id=wof.id
whosonfirst-# GROUP BY wof.metatable
whosonfirst-# ;
+-------------------+------+
| metatable | n |
+-------------------+------+
| | 59 | -- superseded , depricated, ...
| wof_neighbourhood | 29 |
| wof_region | 3 | -- need check!
| wof_county | 14 | -- need check!
| wof_locality | 7212 |
| wof_localadmin | 26 |
+-------------------+------+
whosonfirst=# with wof as ( select id, metatable from wf.wof)
whosonfirst-# select ne_pp.ne_id
whosonfirst-# ,ne_pp.wof_id
whosonfirst-# ,ne_pp.wikidataid
whosonfirst-# ,ne_pp.name_en
whosonfirst-# ,wof.metatable
whosonfirst-# from ne.ne_10m_populated_places as ne_pp
whosonfirst-# left join wf.wof as wof on ne_pp.wof_id=wof.id
whosonfirst-# where
whosonfirst-# wof.metatable not in ('wof_neighbourhood','wof_locality','wof_localadmin')
whosonfirst-# or wof.metatable is NULL
whosonfirst-# ;
+------------+------------+------------+------------------+------------+
| ne_id | wof_id | wikidataid | name_en | metatable |
+------------+------------+------------+------------------+------------+
| 1159119601 | 421167693 | Q984798 | Dalaba | wof_county |
| 1159133939 | 421174007 | Q138986 | Zabīd | wof_county |
| 1159113959 | 421174009 | Q500107 | Fort Portal | wof_county |
| 1159138767 | 421174885 | Q3508985 | Orlu | wof_county |
| 1159125815 | 421175071 | Q1897488 | Anaco | wof_county |
| 1159119699 | 421175263 | Q3077142 | Forécariah | wof_county |
| 1159120029 | 421179955 | Q193226 | Sarandë | wof_county |
| 1159119683 | 421180037 | Q7862201 | Télimélé | wof_county |
| 1159119589 | 421182471 | Q1152701 | Pita | wof_county |
| 1159119755 | 421198015 | Q1006013 | Dinguiraye | wof_county |
| 1159143063 | 421198017 | Q1100303 | Siguiri | wof_county |
| 1159133189 | 421198089 | Q995695 | Puerto Cabello | wof_county |
| 1159113985 | 421202527 | Q2789444 | Kibaale | wof_county |
| 1159127497 | 890461715 | Q1012862 | Tatvan | wof_county |
| 1159151347 | 85670445 | Q1563 | Havana | wof_region |
| 1159150443 | 85674971 | Q3929 | Port Louis | wof_region |
| 1159151263 | 85687595 | Q19660 | Bucharest | wof_region |
| 1159147851 | 421193873 | Q994514 | Rangpur | |
| 1159143137 | 421180709 | Q2080938 | Danané | |
| 1159121599 | 421180035 | Q929255 | Zorgho | |
| 1159151705 | 101915015 | Q1015681 | Lower Hutt | |
| 1159121409 | 421176179 | Q1301716 | Nouna | |
| 1159151669 | 101916559 | Q208948 | Rotorua | |
| 1159151723 | 101916029 | Q597409 | Gore | |
| 1159121471 | 421171817 | Q558346 | Tougan | |
| 1159151707 | 101916211 | Q1115332 | Paraparaumu | |
| 1159151651 | 101915017 | Q212984 | New Plymouth | |
| 1159151735 | 101918409 | Q1362879 | Picton | |
| 1159149463 | 101913495 | Q1780 | Bratislava | |
| 1159151683 | 101915439 | Q613602 | Queenstown | |
| 1159141555 | 421185359 | Q164481 | La Libertad | |
| 1159151671 | 101916537 | Q2397257 | Taupo | |
| 1159151291 | 102008119 | Q656 | Saint Petersburg | |
| 1159151729 | 101916551 | Q112685 | Tokoroa | |
| 1159151117 | 890444545 | Q19689 | Tirana | |
| 1159151647 | 101918663 | Q724923 | Ashburton | |
| 1159151673 | 101915111 | Q207756 | Tauranga | |
| 1159121487 | 421198927 | Q1642022 | Kombissiri | |
| 1159151675 | 101917083 | Q1016574 | Timaru | |
| 1159151635 | 101917147 | Q1223916 | Masterton | |
| 1159151685 | 101916011 | Q31805 | Invercargill | |
| 1159151701 | 101918131 | Q79990 | Christchurch | |
| 1159130393 | 421174657 | Q1192713 | Livingston | |
| 1159151633 | 101916201 | Q166002 | Upper Hutt | |
| 1159151663 | 101918159 | Q1015672 | Whanganui | |
| 1159128545 | 102002493 | Q154738 | Kolpino | |
| 1159151637 | 101915053 | Q1821910 | Levin | |
| 1159151745 | 101915071 | Q1028261 | Cambridge | |
| 1159151681 | 101915045 | Q744239 | Whangarei | |
| 1159151749 | 101918369 | Q1026386 | Turangi | |
| 1159151703 | 101915059 | Q37100 | Auckland | |
| 1159151645 | 101915151 | Q974074 | Whakatane | |
| 1159151687 | 101918391 | Q203380 | Napier | |
| 1159151667 | 101918451 | Q233467 | Gisborne | |
| 1159151717 | 101918177 | Q998452 | Hawera | |
| 1159151677 | 101918267 | Q206687 | Nelson | |
| 1159151657 | 101917067 | Q4782 | Oamaru | |
| 1159121675 | 421183129 | Q578567 | Diébougou | |
| 1159121521 | 421168979 | Q940973 | Yako | |
| 1159151649 | 101917175 | Q1013339 | Kaiapoi | |
| 1159121591 | 421175947 | Q895110 | Boulsa | |
| 1159151731 | 101916125 | Q930344 | Wanaka | |
| 1159151653 | 101918595 | Q1349217 | Westport | |
| 1159150413 | 1141909327 | Q140075 | Gold Coast | |
| 1159121629 | 421203219 | Q1756613 | Pô | |
| 1159151747 | 101915085 | Q1185949 | Kerikeri | |
| 1159151665 | 101918361 | Q1015690 | Hastings | |
| 1159151695 | 101916053 | Q133073 | Dunedin | |
| 1159149299 | 85809317 | Q5092 | Baltimore | |
| 1159121755 | 421178225 | Q254514 | Diapaga | |
| 1159151693 | 101916003 | Q883831 | Blenheim | |
| 1159151059 | 85784763 | Q1930 | Ottawa | |
| 1159143145 | 421173821 | Q582922 | Grand-Bassam | |
| 1159151631 | 101918235 | Q934790 | Greymouth | |
| 1159151709 | 101918073 | Q1015773 | Porirua | |
| 1159119863 | 421203187 | Q521322 | Aboisso | |
+------------+------------+------------+------------------+------------+
(76 rows)
merging via wikidataid
is also not so easy - there are some little problems
example - fixing some region problems from the previous list:
+------------+------------+------------+------------------+------------+
| ne_id | wof_id | wikidataid | name_en | metatable |
+------------+------------+------------+------------------+------------+
| 1159151347 | 85670445 | Q1563 | Havana | wof_region |
| 1159150443 | 85674971 | Q3929 | Port Louis | wof_region |
| 1159151263 | 85687595 | Q19660 | Bucharest | wof_region |
select metatable, id, parent_id, placetype_id, wd_id
from wf.wof
where wd_id in ('Q1563','Q3929','Q19660')
order by wd_id;
+--------------+------------+------------+--------------+--------+
| metatable | id | parent_id | placetype_id | wd_id |
+--------------+------------+------------+--------------+--------+
| wof_locality | 1125939059 | 1091907125 | 102312317 | Q1563 | -- n=2
| wof_locality | 1141909419 | 1091905535 | 102312317 | Q1563 |
| wof_locality | 1125857071 | 85687595 | 102312317 | Q19660 | -- n=2
| wof_locality | 1141909417 | 85687595 | 102312317 | Q19660 |
| wof_locality | 1125952633 | 85674971 | 102312317 | Q3929 | -- it is easy !!
summary : probably need more data research ...
We have
wof_id
s for all of these – and WOF sometimes has more name translations than Wikidata.Output would be a CSV style file with format like:
We'd need to make any name changes upstream in Wikidata to then pull down back into NE. /cc @ImreSamu.
It's possible WOF might not have a name from NE since Wikidata feeds are different vintages, but minority case.