pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 43 forks source link

Using WOF SQLite database ? #416

Closed Joxit closed 5 years ago

Joxit commented 5 years ago

It seems that the WOF team is updating SQLite databases more often than bundles https://github.com/whosonfirst/go-whosonfirst-dist/issues/9.

That's why I thought it would be interesting to supports SQLite for import and wof-admin-lookup.

This will prevent errors like

That's an idea, if you think it's interesting, I'd be happy to implement it.

missinglink commented 5 years ago

Hi @Joxit, happy new year!

We were actually thinking the same thing, @orangejulius and I have discussed this in the past and we would be interested :+1:.

Unfortunately, the sqlite databases came around late in the WOF project, so we originally just supported a git repo clone, then added support for bundles and finally added support for sqlite.

I personally much prefer them because I am a big fan of sqlite and it means we don't have to scan the file system for files and extract large archives to disk (both of these things take ages).

I did modify the pelias/whosonfirst repo to support downloading and extracting sqlite databases to disk, the advantage of that approach is that all the other tools 'just work' but the disadvantage is using more disk space and having to read/write the filesystem.

Are you thinking to continue down that route, where we still expand the database to disk, or to access the data directly from the sqlite databases?

It would also be really great to have all the WOF logic into one place, probably pelias/whosonfirst (or a new module if the dependency cycles are an issue), just so we can keep all that logic in one place.

It'd be ideal if all the modules which used WOF data just depended on one module, which had all the scripts to handle WOF downloading and data access.

What are your thoughts?

Which repo would you like to tackle first? Do you envision supporting bundles & sqlite, or only sqlite?

missinglink commented 5 years ago

Also worth noting we also have code to build WOF sqlite bundles for geocode.earth deployments, so we're actually building independently of the WOF team now as a scheduled task, meaning we can use either host in the case of a failure, and choose how often we want to do updates.

Currently those bundles are in a private bucket although there is no reason we couldn't open them up, just haven't have time & demand for them.

orangejulius commented 5 years ago

I'd definitely support using the SQLite DBs more heavily as well. As Aaron mentions in the linked issue, the recent Geonames import brought in a lot of data, but massively increased the pain of dealing with the filesystem WOF layout.

Joxit commented 5 years ago

Hi @missinglink @orangejulius, thank you, happy new year for you two :smile:

Are you thinking to continue down that route, where we still expand the database to disk, or to access the data directly from the sqlite databases?

I think the best choice is to access the data directly from the sqlite. I do not know if SQLite will cause any performance issue.

It would also be really great to have all the WOF logic into one place, probably pelias/whosonfirst (or a new module if the dependency cycles are an issue), just so we can keep all that logic in one place. Which repo would you like to tackle first?

I will start with pelias/whosonfirst. We should not forget pelias/placeholder which needs the old WOF bundle to build its database, there will be a dependency here.

Do you envision supporting bundles & sqlite, or only sqlite?

Yes, we should be backward compatible as much as possible and use the pelias config to choose which type of data we want.

Also worth noting we also have code to build WOF sqlite bundles for geocode.earth deployments, so we're actually building independently of the WOF team now as a scheduled task, meaning we can use either host in the case of a failure, and choose how often we want to do updates.

Oh... I am in the same case, I have to rebuild the bundle and it is not so simple :disappointed:.

I'd definitely support using the SQLite DBs more heavily as well. As Aaron mentions in the linked issue, the recent Geonames import brought in a lot of data, but massively increased the pain of dealing with the filesystem WOF layout.

Yeah... A 250Go ext4 SSD does not have enough inodes for WOF... I try with btrfs now :sweat_smile: