pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

Add support to *.gz data at import #519

Closed Joxit closed 5 months ago

Joxit commented 5 months ago

Hi there, I added a new feature for OpenAddresses importer.

OA now uses the GeoJSON format which is very verbose and therefore much heavier than CSV. It is increasingly complicated to manage the disk space that can be optimized. So, for space efficiency I thought it would be nice to store only gzip versions on disk and import them. This may use more CPU but it will be the user's choice.

With this PR we are now able to import both raw and gzipped CSV and GeoJSON.

Example for French countrywide addresses from latest build and 2020 CSV build.

File Number of entries Raw Size Gzip Size
fr/countrywide-addresses-country.geojson 26M 6.6G 688M
fr/countrywide.csv 25M 2.4G 584M
missinglink commented 5 months ago

Looks good, thanks!