pelias / openaddresses

Pelias import pipeline for OpenAddresses.
MIT License
51 stars 43 forks source link

Allow download from s3 #475

Closed bboure closed 3 years ago

bboure commented 3 years ago

:wave: I did some awesome work for the Pelias project and would love for everyone to have a look at it and provide feedback.


Here's the reason for this change :rocket:

Downloading from https://data.openaddresses.io is extremely slow, as they are throttling all downloads. I could not find any mirror out there and the only solution to fast download I found was S3, with the requester pays option.


Here's what actually got changed :clap:

I added an option to allow downloading the oa files from S3, with the possibility to add options like --request-payer. This speeds up the download extremely, with the only inconvenient that you'll have to pay for the data transfer.


Here's how others can test the changes :eyes:

Use the following options in pelias.json

"openaddresses": {
      "dataHost": "s3://data.openaddresses.io",
      "s3Options": "--request-payer",
      "datapath": "/data/openaddresses",
      "files": []
}

Note: Authentication is required. A good way to do that is to add access keys in the docker-compose.yml file

  openaddresses:
    image: pelias/openaddresses:latest
    container_name: pelias_openaddresses
    user: "${DOCKER_USER}"
    environment: 
      - "AWS_ACCESS_KEY_ID=XYZ......."
      - "AWS_SECRET_ACCESS_KEY=123..........."
orangejulius commented 3 years ago

Awesome, thanks! S3 is definitely the fastest way to download data for OA as well as the most friendly to the OA project itself.

I tested this out and it appears to work well.