pelias / wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
https://pelias.io
MIT License
9 stars 24 forks source link

Supports for multiple databases #289

Closed Joxit closed 4 years ago

Joxit commented 4 years ago

Background

WOF database will be hosted on geocode.earth, per country will be available, so the code should support per country databases too.

What's new ?

Now, wof admin lookup will combine all databases like whosonfirst-data-[a-z0-9-]+\.db that means both global and per country databases.

related: pelias/whosonfirst#460 pelias/whosonfirst#469 pelias/whosonfirst#477

missinglink commented 4 years ago

Okay, so I am (hopefully) finished for now generating the databases/bundles, here's what we have available: https://geocode.earth/data

orangejulius commented 4 years ago

Nice! This will definitely be needed.

I tried testing this out on our Portland Metro docker project:

I changed pelias.json so that sqlite: true was set, and importPlace was unset. Then I downloaded whosonfirst-admin-us-latest.db manually, and tried an import.

Strangely, there was no USA country record:

$ pelias import oa
info: [wof-pip-service:master] starting with layers neighbourhood,borough,locality,localadmin,county,macrocounty,macroregion,region,dependency,country,empire,
continent
info: [openaddresses] Importing 8 files.
info: [openaddresses] Creating read stream for: /data/openaddresses/us/or/portland_metro.csv
info: [wof-pip-service:master] macrocounty worker loaded 0 features in 1.494 seconds
info: [wof-pip-service:master] dependency worker loaded 0 features in 1.51 seconds
info: [wof-pip-service:master] country worker loaded 0 features in 1.539 seconds
info: [wof-pip-service:master] macroregion worker loaded 0 features in 1.547 seconds
info: [wof-pip-service:master] region worker loaded 36 features in 1.566 seconds
info: [wof-pip-service:master] borough worker loaded 4 features in 1.564 seconds
info: [wof-pip-service:master] continent worker loaded 3 features in 1.598 seconds
info: [wof-pip-service:master] empire worker loaded 11 features in 1.867 seconds

The record is in there, so I wonder if there's something else preventing it from being used?

sqlite> select * from spr where id = '85633793';
85633793|-1||unknown|XX|whosonfirst-data-admin-us|45.1593097445001|0.318715854000203|18.9061171430001|-179.143503384|71.4125023460001|179.780935092|1|0|0|0|0|
||-1

Could this be an issue with the downloads from geocode.earth? Or is it more likely that something in our code to read from a SQLite DB is incompatible with the per-country DBs?

missinglink commented 4 years ago

This looks to me to be a bug in the bundling code.

I recently added some code to include alt geoms in the SQLite files but it looks like while the geojson table allows multiple entries with the same WOFID, the spr table does not.

I think in this case it imported an alt geometry first and then INSERT OR IGNORE meant that row stuck when the primary feature for that WOFID was imported.

You can see in the SQL output that there is no parent, no name and no placetype, which seems to confirm this.

I'll have a look, unfortunately there may be more of these bugs but we can work through them and cover them with unit tests and they'll go away ;)

FYI I always run .headers on in the sqlite3 CLI, it makes debugging a bit easier!

missinglink commented 4 years ago

Yep, so that's exactly what happened:

sqlite> select rowid, id, is_alt from geojson where id = '85633793';
rowid|id|is_alt
265744|85633793|0
265743|85633793|1

I've updated the code, published a new npm version and invalidated the build cache, a full data rebuild is scheduled to kick off in 30 mins.

orangejulius commented 4 years ago

Okay great. So I think it's safe to say this code works well :)

If a new bundle is coming soon, I'll wait to test that out just in case, but that's as much for my own enjoyment of seeing it work as for any actual confidence in this code.

missinglink commented 4 years ago
$ cat tmp/sqlite/whosonfirst-data-admin-us-latest.db.bz2.json
{
  "name_compressed": "whosonfirst-data-admin-us-latest.db.bz2",
  "size_compressed": 706543390,
  "sha256_compressed": "bea05ca8377b04aa966b3ef97fb341873cc76f8c23410389cd418f8dbe177a8c",
  "last_modified": "2020-03-10T13:23:49.225Z",
  "name": "whosonfirst-data-admin-us-latest.db",
  "size": 3387883520,
  "sha256": "b231a5cf8c73f9929f3eddcc10b910c38c24f9b5135548eadf41f12cc9b05718",
  "repo": "whosonfirst-data-admin-us",
  "vintage": "latest",
  "commit": "9d2abe154a3d66524ec53a5a44ee15ca986effa6",
  "last_updated": "2020-03-05T22:14:00.000Z"
}
sqlite> select * from spr where id = '85633793';
id|parent_id|name|placetype|country|repo|latitude|longitude|min_latitude|min_longitude|max_latitude|max_longitude|is_current|is_deprecated|is_ceased|is_superseded|is_superseding|superseded_by|supersedes|lastmodified
85633793|102191575|United States|country|US|whosonfirst-data-admin-us|45.1532595|0.314297500000009|18.86546|-179.231086|71.441059|179.859681|1|0|0|0|0|||1583217946

@orangejulius the file is uploaded but the GE data page is still caching the old inventory somehow, you should be able to downlod the new version from here (just subtract the .json): https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-us-latest.db.bz2.json

orangejulius commented 4 years ago

Works great now. Thanks @Joxit!

Joxit commented 4 years ago

Yeah! You are welcome :wink: