pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 43 forks source link

Download sqlite database without storing temporary archive #402

Closed orangejulius closed 4 years ago

orangejulius commented 5 years ago

The sqlite download currently downloads the bz2 archive to a temporary file, and then extracts the database from that local file. This is not ideal for two reasons:

It appears this was done since the timestamp of the archived file is generated after it's downloaded, and used for future comparison to avoid re-downloading identical files in the future.

We could probably streamline this by using curl to get the remote last modified time via HEAD request, and then downloading the archive, without a temporary file, immediately after.

missinglink commented 5 years ago

We have had issues piping curl in the bunzip in the past, the temporary file isn't ideal but its proven itself to be stable.

Joxit commented 4 years ago

Fixed since https://github.com/pelias/whosonfirst/pull/417#issuecomment-458850981.

https://github.com/pelias/whosonfirst/blob/a0bc28d2139dcea2599970ccc9466c68fa3ac54c/utils/download_sqlite_all.js#L96