robjuz / helm-charts

https://robjuz.github.io/helm-charts/index.yaml
34 stars 30 forks source link

nominatim: Wikipedia importance import step fails #45

Closed dekzz closed 1 year ago

dekzz commented 1 year ago

Hello,

when import wikipedia is enabled (with default url: wikipediaUrl: https://nominatim.org/data/wikimedia-importance.sql.gz) it fails with the following error:

  Importing wikipedia importance data
  Traceback (most recent call last):
    File "/usr/local/bin/nominatim", line 14, in <module>
      exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module',
    File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 264, in nominatim
      return parser.run(**kwargs)
    File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 126, in run
      return args.command.run(args)
    File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 101, in run
      if refresh.import_wikipedia_articles(args.config.get_libpq_dsn(),
    File "/usr/local/lib/nominatim/lib-python/nominatim/tools/refresh.py", line 144, in import_wikipedia_articles
      execute_file(dsn, datafile, ignore_errors=ignore_errors,
    File "/usr/local/lib/nominatim/lib-python/nominatim/db/utils.py", line 62, in execute_file
      remain = _pipe_to_proc(proc, fdesc)
    File "/usr/local/lib/nominatim/lib-python/nominatim/db/utils.py", line 25, in _pipe_to_proc
      chunk = fdesc.read(2048)
    File "/usr/lib/python3.10/gzip.py", line 301, in read
      return self._buffer.read(size)
    File "/usr/lib/python3.10/_compression.py", line 68, in readinto
      data = self.read(len(byte_view))
    File "/usr/lib/python3.10/gzip.py", line 488, in read
      if not self._read_gzip_header():
    File "/usr/lib/python3.10/gzip.py", line 436, in _read_gzip_header
      raise BadGzipFile('Not a gzipped file (%r)' % magic)
  gzip.BadGzipFile: Not a gzipped file (b'<h')

After some reasearch I found out it was due to curl being blocked as described here, in combination with storing curl (failed) response to wiki file.

I'll create PR for this which will set user agent on curl in order to fetch data from nominatim server.

dekzz commented 1 year ago

Merge sha bb88549.