mediagis / nominatim-docker

100% working container for Nominatim
Creative Commons Zero v1.0 Universal
1.07k stars 437 forks source link

Downloading wikipedia importance failing - 403 Forbidden #483

Closed dbt-lucka closed 10 months ago

dbt-lucka commented 10 months ago
2023-10-06 16:46:56 + echo 'Downloading Wikipedia importance dump'
2023-10-06 16:46:56 + curl https://nominatim.org/data/wikimedia-importance.sql.gz -L -o /nominatim/wikimedia-importance.sql.gz
2023-10-06 16:46:56   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2023-10-06 16:46:56                                  Dload  Upload   Total   Spent    Left  Speed
100   153  100   153    0     0    841      0 --:--:-- --:--:-- --:--:--   845

.....
.....

2023-10-06 16:51:51 2023-10-06 14:51:51: Importing wikipedia importance data
2023-10-06 16:51:52 Traceback (most recent call last):
2023-10-06 16:51:52   File "/usr/local/bin/nominatim", line 14, in <module>
2023-10-06 16:51:52     exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 264, in nominatim
2023-10-06 16:51:52     return parser.run(**kwargs)
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 126, in run
2023-10-06 16:51:52     return args.command.run(args)
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 101, in run
2023-10-06 16:51:52     if refresh.import_wikipedia_articles(args.config.get_libpq_dsn(),
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/tools/refresh.py", line 144, in import_wikipedia_articles
2023-10-06 16:51:52     execute_file(dsn, datafile, ignore_errors=ignore_errors,
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/db/utils.py", line 62, in execute_file
2023-10-06 16:51:52     remain = _pipe_to_proc(proc, fdesc)
2023-10-06 16:51:52   File "/usr/local/lib/nominatim/lib-python/nominatim/db/utils.py", line 25, in _pipe_to_proc
2023-10-06 16:51:52     chunk = fdesc.read(2048)
2023-10-06 16:51:52   File "/usr/lib/python3.10/gzip.py", line 301, in read
2023-10-06 16:51:52     return self._buffer.read(size)
2023-10-06 16:51:52   File "/usr/lib/python3.10/_compression.py", line 68, in readinto
2023-10-06 16:51:52     data = self.read(len(byte_view))
2023-10-06 16:51:52   File "/usr/lib/python3.10/gzip.py", line 488, in read
2023-10-06 16:51:52     if not self._read_gzip_header():
2023-10-06 16:51:52   File "/usr/lib/python3.10/gzip.py", line 436, in _read_gzip_header
2023-10-06 16:51:52     raise BadGzipFile('Not a gzipped file (%r)' % magic)
2023-10-06 16:51:52 gzip.BadGzipFile: Not a gzipped file (b'<h')

It downloaded "something", however its not the real thing:

# ls -ltr
total 412348
-rw-r--r-- 1 nominatim nominatim 422235070 Oct  6 14:26 data.osm.pbf
-rw-r--r-- 1 nominatim nominatim       153 Oct  6 14:51 wikimedia-importance.sql.gz
# gunzip wikimedia-importance.sql.gz

gzip: wikimedia-importance.sql.gz: not in gzip format

Root cause:

# cat wikimedia-importance.sql.gz
<html>
<head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.18.0</center>
</body>
</html>
leonardehrenfried commented 10 months ago

You can read about this whole saga here: https://github.com/mediagis/nominatim-docker/issues/416

The solution is to use the newest version of the image.

dbt-lucka commented 10 months ago

Ok, cool. I think it could make sense to mention this somewhere, I could create a section in the readme. Or set 4.2 to "unsupported"?

leonardehrenfried commented 10 months ago

4.2:latest has the fix. Which specific version are you using?

dbt-lucka commented 10 months ago

4.2, now 4.3

leonardehrenfried commented 10 months ago

No, what is the digest of the image? When was it created? If it's from June or later it has the fix.

dbt-lucka commented 10 months ago

Ah, I see. You re-released 4.2 with a different image. My image was 10 months old. It works with 4.3 now. Thanks

leonardehrenfried commented 10 months ago

We tag the very latest build with both 4.2 and 4.2-${git-commit-sha}: https://hub.docker.com/r/mediagis/nominatim/tags

So yes, what 4.2 refers to changes for every new build.