Closed thisisaaronland closed 8 years ago
Or not, because Geonames doesn't even have multi-part postal codes for Canadia...
less allCountries.txt | grep -e '^CA' | grep 'Montreal'
CA H1B Montreal East Quebec QC 45.632 -73.5075 4
CA H1G Montreal North North Quebec QC 45.6109 -73.6211 1
CA H1H Montreal North South Quebec QC 45.5899 -73.6389 1
CA H2Y Old Montreal Quebec QC 45.5057 -73.555
CA H2Z Downtown Montreal Northeast Quebec QC 45.5052 -73.5622
CA H3A Downtown Montreal North Quebec QC 45.504 -73.5747 1
CA H3B Downtown Montreal East Quebec QC 45.5005 -73.5684 1
CA H3G Downtown Montreal Southeast Quebec QC 45.4987 -73.5793 1
CA H3H Downtown Montreal South & West Quebec QC 45.5009 -73.5877 1
CA H4X Montreal West Quebec QC
Basically, start with GeoPlanet and then append Geonames coordinate data where we know it's not insane (probably the US)
Total number of unique postal codes:
cat allCountries.txt | awk '{ print $2 }' | sort | uniq | wc -l
482795
Which is a bit of a misnomer since postal codes are not unique between countries.
grep Zip geoplanet_places_7.10.0.tsv | awk '{ print $3 }' | sort | uniq | wc -l
505502
grep Zip geoplanet_places_7.10.0.tsv | awk '{ print $3 }' | wc -l
3457144
GeoPlanet has explicit parent IDs so the first step should be to see what the counts are for unique parent IDs and WOF concordances
Hrmph. As in concordances between WOF and the (WOE/GP) parent ID for a postal code...
python ./parents.py ./zip.tsv
found 22028 missing 182021
Non-optimized imports are averaging about 1M/24 hours so another day or so, unless something blows its brains out...
find ./data -name '*.geojson' -print | wc -l
2072271
find ./data -name '*.geojson' -print | wc -l
3176709
Complete. Waiting for issue #3 to complete.
Once that's done will migrate all the data per issue #6
De-dupe against existing Geonames import.