nvkelso / natural-earth-vector

A global, public domain map dataset available at three scales and featuring tightly integrated vector and raster data.
https://www.naturalearthdata.com/
Other
1.75k stars 368 forks source link

Compare name translations for populated places with Who's On First #262

Open nvkelso opened 6 years ago

nvkelso commented 6 years ago

We have wof_ids for all of these – and WOF sometimes has more name translations than Wikidata.

Output would be a CSV style file with format like:

ne_id wikidata_id wof_id country_code name_en name_en_new name_zh name_zh_new
1 1 1 US Old New Old New

We'd need to make any name changes upstream in Wikidata to then pull down back into NE. /cc @ImreSamu.

It's possible WOF might not have a name from NE since Wikidata feeds are different vintages, but minority case.

stepps00 commented 6 years ago

tagging myself ... @stepps00

ImreSamu commented 6 years ago

we can merge the data ( ne.ne_10m_populated_places - wof ) via

imho: first - we need a little data cleaning in the curent - ne_10m_populated_places. wof_id

example = Auckland

| ne_id = 1159151703 | wof_id = 101915059 ( this record has been superseded ) | wikidataid = Q37100

the Q37100 is pointing to -> wof_locality | 101914257

whosonfirst=# 
whosonfirst=# select  metatable, id, parent_id, placetype_id, wd_id  
whosonfirst-# from wf.wof 
whosonfirst-# where wd_id='Q37100'
whosonfirst-# ;
+--------------+-----------+-----------+--------------+--------+
|  metatable   |    id     | parent_id | placetype_id | wd_id  |
+--------------+-----------+-----------+--------------+--------+
| wof_locality | 101914257 | 102079403 |    102312317 | Q37100 |
+--------------+-----------+-----------+--------------+--------+

some quick&dirty stat ( based on ~ 2-3 weeks old WOF data )

whosonfirst=# 
whosonfirst=# SELECT wof.metatable
whosonfirst-#       ,count(*) as N
whosonfirst-# FROM       ne.ne_10m_populated_places AS ne_pp
whosonfirst-# LEFT JOIN  wf.wof  AS wof ON ne_pp.wof_id=wof.id 
whosonfirst-# GROUP BY wof.metatable    
whosonfirst-# ; 
+-------------------+------+
|     metatable     |  n   |
+-------------------+------+
|                   |   59 |     -- superseded , depricated, ...
| wof_neighbourhood |   29 |
| wof_region        |    3 |   -- need check! 
| wof_county        |   14 |  -- need check! 
| wof_locality      | 7212 |
| wof_localadmin    |   26 |
+-------------------+------+
whosonfirst=# with wof as ( select id, metatable from wf.wof)
whosonfirst-# select ne_pp.ne_id
whosonfirst-#       ,ne_pp.wof_id
whosonfirst-#       ,ne_pp.wikidataid
whosonfirst-#       ,ne_pp.name_en
whosonfirst-#       ,wof.metatable
whosonfirst-# from      ne.ne_10m_populated_places as ne_pp
whosonfirst-# left join wf.wof as wof on ne_pp.wof_id=wof.id 
whosonfirst-# where 
whosonfirst-#          wof.metatable not in ('wof_neighbourhood','wof_locality','wof_localadmin')
whosonfirst-#      or  wof.metatable is NULL
whosonfirst-# ; 
+------------+------------+------------+------------------+------------+
|   ne_id    |   wof_id   | wikidataid |     name_en      | metatable  |
+------------+------------+------------+------------------+------------+
| 1159119601 |  421167693 | Q984798    | Dalaba           | wof_county |
| 1159133939 |  421174007 | Q138986    | Zabīd           | wof_county |
| 1159113959 |  421174009 | Q500107    | Fort Portal      | wof_county |
| 1159138767 |  421174885 | Q3508985   | Orlu             | wof_county |
| 1159125815 |  421175071 | Q1897488   | Anaco            | wof_county |
| 1159119699 |  421175263 | Q3077142   | Forécariah      | wof_county |
| 1159120029 |  421179955 | Q193226    | Sarandë         | wof_county |
| 1159119683 |  421180037 | Q7862201   | Télimélé      | wof_county |
| 1159119589 |  421182471 | Q1152701   | Pita             | wof_county |
| 1159119755 |  421198015 | Q1006013   | Dinguiraye       | wof_county |
| 1159143063 |  421198017 | Q1100303   | Siguiri          | wof_county |
| 1159133189 |  421198089 | Q995695    | Puerto Cabello   | wof_county |
| 1159113985 |  421202527 | Q2789444   | Kibaale          | wof_county |
| 1159127497 |  890461715 | Q1012862   | Tatvan           | wof_county |
| 1159151347 |   85670445 | Q1563      | Havana           | wof_region |
| 1159150443 |   85674971 | Q3929      | Port Louis       | wof_region |
| 1159151263 |   85687595 | Q19660     | Bucharest        | wof_region |
| 1159147851 |  421193873 | Q994514    | Rangpur          |            |
| 1159143137 |  421180709 | Q2080938   | Danané          |            |
| 1159121599 |  421180035 | Q929255    | Zorgho           |            |
| 1159151705 |  101915015 | Q1015681   | Lower Hutt       |            |
| 1159121409 |  421176179 | Q1301716   | Nouna            |            |
| 1159151669 |  101916559 | Q208948    | Rotorua          |            |
| 1159151723 |  101916029 | Q597409    | Gore             |            |
| 1159121471 |  421171817 | Q558346    | Tougan           |            |
| 1159151707 |  101916211 | Q1115332   | Paraparaumu      |            |
| 1159151651 |  101915017 | Q212984    | New Plymouth     |            |
| 1159151735 |  101918409 | Q1362879   | Picton           |            |
| 1159149463 |  101913495 | Q1780      | Bratislava       |            |
| 1159151683 |  101915439 | Q613602    | Queenstown       |            |
| 1159141555 |  421185359 | Q164481    | La Libertad      |            |
| 1159151671 |  101916537 | Q2397257   | Taupo            |            |
| 1159151291 |  102008119 | Q656       | Saint Petersburg |            |
| 1159151729 |  101916551 | Q112685    | Tokoroa          |            |
| 1159151117 |  890444545 | Q19689     | Tirana           |            |
| 1159151647 |  101918663 | Q724923    | Ashburton        |            |
| 1159151673 |  101915111 | Q207756    | Tauranga         |            |
| 1159121487 |  421198927 | Q1642022   | Kombissiri       |            |
| 1159151675 |  101917083 | Q1016574   | Timaru           |            |
| 1159151635 |  101917147 | Q1223916   | Masterton        |            |
| 1159151685 |  101916011 | Q31805     | Invercargill     |            |
| 1159151701 |  101918131 | Q79990     | Christchurch     |            |
| 1159130393 |  421174657 | Q1192713   | Livingston       |            |
| 1159151633 |  101916201 | Q166002    | Upper Hutt       |            |
| 1159151663 |  101918159 | Q1015672   | Whanganui        |            |
| 1159128545 |  102002493 | Q154738    | Kolpino          |            |
| 1159151637 |  101915053 | Q1821910   | Levin            |            |
| 1159151745 |  101915071 | Q1028261   | Cambridge        |            |
| 1159151681 |  101915045 | Q744239    | Whangarei        |            |
| 1159151749 |  101918369 | Q1026386   | Turangi          |            |
| 1159151703 |  101915059 | Q37100     | Auckland         |            |
| 1159151645 |  101915151 | Q974074    | Whakatane        |            |
| 1159151687 |  101918391 | Q203380    | Napier           |            |
| 1159151667 |  101918451 | Q233467    | Gisborne         |            |
| 1159151717 |  101918177 | Q998452    | Hawera           |            |
| 1159151677 |  101918267 | Q206687    | Nelson           |            |
| 1159151657 |  101917067 | Q4782      | Oamaru           |            |
| 1159121675 |  421183129 | Q578567    | Diébougou       |            |
| 1159121521 |  421168979 | Q940973    | Yako             |            |
| 1159151649 |  101917175 | Q1013339   | Kaiapoi          |            |
| 1159121591 |  421175947 | Q895110    | Boulsa           |            |
| 1159151731 |  101916125 | Q930344    | Wanaka           |            |
| 1159151653 |  101918595 | Q1349217   | Westport         |            |
| 1159150413 | 1141909327 | Q140075    | Gold Coast       |            |
| 1159121629 |  421203219 | Q1756613   | Pô              |            |
| 1159151747 |  101915085 | Q1185949   | Kerikeri         |            |
| 1159151665 |  101918361 | Q1015690   | Hastings         |            |
| 1159151695 |  101916053 | Q133073    | Dunedin          |            |
| 1159149299 |   85809317 | Q5092      | Baltimore        |            |
| 1159121755 |  421178225 | Q254514    | Diapaga          |            |
| 1159151693 |  101916003 | Q883831    | Blenheim         |            |
| 1159151059 |   85784763 | Q1930      | Ottawa           |            |
| 1159143145 |  421173821 | Q582922    | Grand-Bassam     |            |
| 1159151631 |  101918235 | Q934790    | Greymouth        |            |
| 1159151709 |  101918073 | Q1015773   | Porirua          |            |
| 1159119863 |  421203187 | Q521322    | Aboisso          |            |
+------------+------------+------------+------------------+------------+
(76 rows)
ImreSamu commented 6 years ago

merging via wikidataid is also not so easy - there are some little problems

example - fixing some region problems from the previous list:

+------------+------------+------------+------------------+------------+
|   ne_id    |   wof_id   | wikidataid |     name_en      | metatable  |
+------------+------------+------------+------------------+------------+
| 1159151347 |   85670445 | Q1563      | Havana           | wof_region |
| 1159150443 |   85674971 | Q3929      | Port Louis       | wof_region |
| 1159151263 |   85687595 | Q19660     | Bucharest        | wof_region |
select  metatable, id, parent_id, placetype_id, wd_id  
from wf.wof 
where wd_id in ('Q1563','Q3929','Q19660')
order by wd_id;
+--------------+------------+------------+--------------+--------+
|  metatable   |     id     | parent_id  | placetype_id | wd_id  |
+--------------+------------+------------+--------------+--------+
| wof_locality | 1125939059 | 1091907125 |    102312317 | Q1563  |  -- n=2
| wof_locality | 1141909419 | 1091905535 |    102312317 | Q1563  |
| wof_locality | 1125857071 |   85687595 |    102312317 | Q19660 |   -- n=2
| wof_locality | 1141909417 |   85687595 |    102312317 | Q19660 |
| wof_locality | 1125952633 |   85674971 |    102312317 | Q3929  |   -- it is easy  !!

summary : probably need more data research ...