mediagis / nominatim-docker

100% working container for Nominatim
Creative Commons Zero v1.0 Universal
1.07k stars 437 forks source link

v4.2 Wikipedia import curl 403 #447

Closed Y0ngg4n closed 10 months ago

Y0ngg4n commented 1 year ago

Describe the bug I get a 403 when enabling Wikipedia import. The USER_AGENT variable doesnt seem to change anything.

To Reproduce Steps to reproduce the behavior:

  1. Ge the 4.2 docker-compose and enable wikipedia import
  2. See error

Expected behavior The curl download working

Desktop / Server (please complete the following information):

mtmail commented 1 year ago

Can you list the user agent value you use? Then it’s easier to check on the server.

Y0ngg4n commented 1 year ago

@mtmail this are the logs:

nominatim exited with code 0
nominatim  | + tailpid=0
nominatim  | + replicationpid=0
nominatim  | + trap stopServices SIGTERM TERM INT
nominatim  | + /app/config.sh
nominatim  | + id nominatim
nominatim  | + useradd -m -p very_secure_password nominatim
nominatim  | + IMPORT_FINISHED=/var/lib/postgresql/14/main/import-finished
nominatim  | + '[' '!' -f /var/lib/postgresql/14/main/import-finished ']'
nominatim  | + /app/init.sh
nominatim  | + OSMFILE=/nominatim/data.osm.pbf
nominatim  | + CURL=("curl" "-L" "-A" "${USER_AGENT}" "--fail-with-body")
nominatim  | + '[' true = true ']'
nominatim  | + echo 'Downloading Wikipedia importance dump'
nominatim  | + curl -L -A mediagis/nominatim-docker:4.2.3 --fail-with-body https://nominatim.org/data/wikimedia-importance.sql.gz -o /nominatim/wikimedia-importance.sql.gz
nominatim  | Downloading Wikipedia importance dump
nominatim  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
nominatim  |                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0    426      0 --:--:-- --:--:-- --:--:--   427
nominatim  | curl: (22) The requested URL returned error: 403
nominatim  | + tailpid=0
nominatim  | + replicationpid=0
nominatim  | + trap stopServices SIGTERM TERM INT
nominatim  | + /app/config.sh
nominatim  | + id nominatim
nominatim  | + echo 'user nominatim already exists'
nominatim  | + IMPORT_FINISHED=/var/lib/postgresql/14/main/import-finished
nominatim  | + '[' '!' -f /var/lib/postgresql/14/main/import-finished ']'
nominatim  | user nominatim already exists
nominatim  | + /app/init.sh
nominatim  | + OSMFILE=/nominatim/data.osm.pbf
nominatim  | + CURL=("curl" "-L" "-A" "${USER_AGENT}" "--fail-with-body")
nominatim  | Downloading Wikipedia importance dump
nominatim  | + '[' true = true ']'
nominatim  | + echo 'Downloading Wikipedia importance dump'
nominatim  | + curl -L -A mediagis/nominatim-docker:4.2.3 --fail-with-body https://nominatim.org/data/wikimedia-importance.sql.gz -o /nominatim/wikimedia-importance.sql.gz
nominatim  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
nominatim  |                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0    825      0 --:--:-- --:--:-- --:--:--   827
nominatim  | curl: (22) The requested URL returned error: 403
nominatim  | + tailpid=0
nominatim  | + replicationpid=0
nominatim  | + trap stopServices SIGTERM TERM INT
nominatim  | + /app/config.sh
nominatim  | + id nominatim
nominatim  | user nominatim already exists
nominatim  | + echo 'user nominatim already exists'
nominatim  | + IMPORT_FINISHED=/var/lib/postgresql/14/main/import-finished
nominatim  | + '[' '!' -f /var/lib/postgresql/14/main/import-finished ']'
nominatim  | + /app/init.sh
nominatim  | + OSMFILE=/nominatim/data.osm.pbf
nominatim  | + CURL=("curl" "-L" "-A" "${USER_AGENT}" "--fail-with-body")
nominatim  | + '[' true = true ']'
nominatim  | + echo 'Downloading Wikipedia importance dump'
nominatim  | Downloading Wikipedia importance dump
nominatim  | + curl -L -A mediagis/nominatim-docker:4.2.3 --fail-with-body https://nominatim.org/data/wikimedia-importance.sql.gz -o /nominatim/wikimedia-importance.sql.gz
nominatim  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
nominatim  |                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0   1017      0 --:--:-- --:--:-- --:--:--  1020
nominatim  | curl: (22) The requested URL returned error: 403
nominatim  | + tailpid=0
nominatim  | + replicationpid=0
nominatim  | + trap stopServices SIGTERM TERM INT
nominatim  | + /app/config.sh
nominatim  | + id nominatim
nominatim  | + echo 'user nominatim already exists'
nominatim  | + IMPORT_FINISHED=/var/lib/postgresql/14/main/import-finished
nominatim  | + '[' '!' -f /var/lib/postgresql/14/main/import-finished ']'
nominatim  | user nominatim already exists
nominatim  | + /app/init.sh
nominatim  | + OSMFILE=/nominatim/data.osm.pbf
nominatim  | + CURL=("curl" "-L" "-A" "${USER_AGENT}" "--fail-with-body")
nominatim  | + '[' true = true ']'
nominatim  | + echo 'Downloading Wikipedia importance dump'
nominatim  | Downloading Wikipedia importance dump
nominatim  | + curl -L -A mediagis/nominatim-docker:4.2.3 --fail-with-body https://nominatim.org/data/wikimedia-importance.sql.gz -o /nominatim/wikimedia-importance.sql.gz
nominatim  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
nominatim  |                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0    617      0 --:--:-- --:--:-- --:--:--   619
nominatim  | curl: (22) The requested URL returned error: 403
mtmail commented 1 year ago

Can you list the user agent value you use and how you're setting it?

mtmail commented 1 year ago

https://github.com/mediagis/nominatim-docker/issues/420 has an example how to set it. (Don't set it to generic like Firefox or wget)

Y0ngg4n commented 1 year ago

@mtmail already tried that. But did not work either. It seems like the evironment variable has no effect to the curl command

leonardehrenfried commented 1 year ago

Can you paste your complete command that you execute?

Y0ngg4n commented 1 year ago

I am using the docker compose from the repo

leonardehrenfried commented 1 year ago

That doesn't set a USER_AGENT. Can you paste where you set it?

hemna commented 1 year ago

I am running into this as well with the example docker-compose file here: https://github.com/mediagis/nominatim-docker/blob/master/4.2/contrib/docker-compose-planet.yml

nominatim  | + id nominatim
nominatim  | + echo 'user nominatim already exists'
nominatim  | + IMPORT_FINISHED=/var/lib/postgresql/14/main/import-finished
nominatim  | + '[' '!' -f /var/lib/postgresql/14/main/import-finished ']'
nominatim  | + /app/init.sh
nominatim  | user nominatim already exists
nominatim  | + OSMFILE=/nominatim/data.osm.pbf
nominatim  | + CURL=("curl" "-L" "-A" "${USER_AGENT}" "--fail-with-body")
nominatim  | + '[' true = true ']'
nominatim  | + echo 'Downloading Wikipedia importance dump'
nominatim  | + curl -L -A mediagis/nominatim-docker:4.2.3 --fail-with-body https://nominatim.org/data/wikimedia-importance.sql.gz -o /nominatim/wikimedia-importance.sql.gz
nominatim  | Downloading Wikipedia importance dump
nominatim  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
nominatim  |                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0    307      0 --:--:-- --:--:-- --:--:--   307
nominatim  | curl: (22) The requested URL returned error: 403
nominatim exited with code 22

I can manually wget the file just fine.

└─> wget https://nominatim.org/data/wikimedia-importance.sql.gz
--2023-06-14 19:19:05--  https://nominatim.org/data/wikimedia-importance.sql.gz
Resolving nominatim.org (nominatim.org)... 138.201.190.130
Connecting to nominatim.org (nominatim.org)|138.201.190.130|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 393574858 (375M) [application/octet-stream]
Saving to: ‘wikimedia-importance.sql.gz.1’

wikimedia-importance.sql.gz 100%[========================================>] 375.34M  7.33MB/s    in 51s

2023-06-14 19:19:57 (7.30 MB/s) - ‘wikimedia-importance.sql.gz.1’ saved [393574858/393574858]
leonardehrenfried commented 1 year ago

Please read #420