pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
26 stars 42 forks source link

Issue with pelias download wof: Corrupted SQLite during whosonfirst full planent Data Download #545

Closed taminoelgert closed 4 months ago

taminoelgert commented 4 months ago

Describe the bug When attempting a full planet build using Kubernetes, the pelias download wof command consistently throws the following error after downloading the whosonfirst data:

error: [whosonfirst] error downloading whosonfirst-data-admin-latest.db.bz2
Error: Command failed: curl -sA 'pelias-whosonfirst/0.0.0-development' https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2 | lbunzip2 > /data/whosonfirst/sqlite/whosonfirst-data-admin-latest.db
lbunzip2: stdin: compressed data error: bad block header magic

Steps to Reproduce

Expected behavior

The pelias download wof command should download the whosonfirst data without encountering any errors.

Environment (please complete the following information):

Pastebin/Screenshots

pelias config:

{
      "logger": {
        "level": "info",
        "timestamp": true
      },
      "esclient": {
        "apiVersion": "7.x",
        "hosts": [
          {
            "protocol": "https",
            "host": "geocoder-es-http",
          }
        ]
      },
      "acceptance-tests": {
        "endpoints": {
          "docker": "http://pelias-api:4000/v1/"
        }
      },
      "api": {
        "services": {
          "placeholder": {
            "url": "http://pelias-placeholder:4100"},
          "interpolation": {
            "url": "http://pelias-interpolation:4300"},
          "libpostal": {
            "url": "http://pelias-libpostal:4400"}
        }
      },
      "imports": {
        "adminLookup": {
          "enabled": true
        },
        "geonames": {
          "datapath": "/data/geonames",
          "countryCode": "ALL"
        },
        "openstreetmap": {
          "download": [
            {
              "sourceURL": "https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf"}
          ],
          "leveldbpath": "/tmp",
          "datapath": "/data/openstreetmap",
          "import": [
            {
              "filename": "planet-latest.osm.pbf"
            }]
        },
        "openaddresses": {
          "datapath": "/data/openaddresses",
          "files": [
          ]
        },
        "polyline": {
          "datapath": "/data/polylines",
          "files": [
            "extract.0sv"]
        },
        "whosonfirst": {
          "datapath": "/data/whosonfirst",
          "importPostalcodes": true
        },
        "interpolation": {
          "download": {
            "tiger": {
              "datapath": "/data/tiger"
            }
          }
        }
      }
    }

Additional context

The issue can also be reproduced locally in a Docker environment by following the same steps up to the pelias download all command. Subsequent steps, such as placeholder prepare, fail because "the SQLite is corrupted."

References

Thank you for your assessment

missinglink commented 4 months ago

Hi @taminoelgert, I wasn't able to reproduce this issue.

It might have been an intermittent connection issue with our CDN provider https://bunny.net/ Could you please confirm if the issue has resolved itself?

aria2c https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2

03/11 15:29:45 [NOTICE] Downloading 1 item(s)
 *** Download Progress Summary as of Mon Mar 11 15:30:47 2024 ***
=============================================================================
[#b8d2ec 6.2GiB/8.0GiB(78%) CN:1 DL:92MiB ETA:19s]
FILE: /tmp/whosonfirst-data-admin-latest.db.bz2
-----------------------------------------------------------------------------

[#b8d2ec 7.9GiB/8.0GiB(98%) CN:1 DL:108MiB]
03/11 15:31:06 [NOTICE] Download complete: /tmp/whosonfirst-data-admin-latest.db.bz2

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
b8d2ec|OK  |   105MiB/s|/tmp/whosonfirst-data-admin-latest.db.bz2

Status Legend:
(OK):download completed.
lbunzip2 -t whosonfirst-data-admin-latest.db.bz2

echo $?
0
taminoelgert commented 4 months ago

Thanks for the reply, I have just tried again and now it seems to be working without any problems. Thanks for the help though, I'll close the ticket then.