Updating stuck at XML parsing error

tuukka commented 3 years ago

Similar to #65, the update process on one of two equivalent Overpass instances got stuck. Restarting the container does not help and the following lines repeat over and over:

/db/diffs/changes.osm exists. Trying to apply again.
XML parsing error at line 3, column 0: no element found

# cat /db/diffs/changes.osm 
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="libosmium/2.15.4">
#

The content of /db/replicate_id is 4461455 and there is no /db/replicate_id.backup.

This is what an API response says about the version and timestamp:

{
  "version": 0.6,
  "generator": "Overpass API 0.7.56.3 eb200aeb",
  "osm3s": {
    "timestamp_osm_base": "2021-03-19T01:58:28Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [
  ]
}

Both containers were created a couple of weeks ago with the following command:

docker run
  -e OVERPASS_META=yes
  -e OVERPASS_MODE=init
  -e OVERPASS_PLANET_URL=https://download.geofabrik.de/europe/finland-latest.osm.pbf
  -e OVERPASS_DIFF_URL=http://download.openstreetmap.fr/replication/europe/finland/minute/
  -e OVERPASS_RULES_LOAD=10
  -e OVERPASS_COMPRESSION=gz
  -e OVERPASS_UPDATE_SLEEP=60
  -e OVERPASS_PLANET_PREPROCESS='mv /db/planet.osm.bz2 /db/planet.osm.pbf && osmium cat -o /db/planet.osm.bz2 /db/planet.osm.pbf && rm /db/planet.osm.pbf'
  -v /opt/docker/overpass_db/:/db -p 12347:80 -i -t --name overpass_finland wiktorn/overpass-api

wiktorn commented 3 years ago

@tuukka: Can you provide the image hash that you use? E.g. by:

$ docker image ls | grep overpass

I have a hunch, that you're not running most recent version of the image. You can also check that by invoking within containter:

cat /app/bin/update_overpass.sh  | grep fileinfo

I guess, that your line will be different than:

VERSION=$(osmium fileinfo -e -g data.timestamp.last "${DIFF_FILE}" || (cp -f /db/replicate_id.backup /db/replicate_id && echo "Broken file" && cat "${DIFF_FILE}" && rm -f "${DIFF_FILE}" && exit 1 ))

(current version)

tuukka commented 3 years ago

Indeed, the latest tag points to an old image! Same on Docker Hub as well.

$ docker image ls | grep overpass
wiktorn/overpass-api                      latest                   a675dd8f1a01        6 months ago        253MB

# cat /app/bin/update_overpass.sh  | grep fileinfo
            VERSION=$(osmium fileinfo -e -g data.timestamp.last "${DIFF_FILE}")

wiktorn commented 3 years ago

Thanks for pointing out. Docker automated builds are getting really troublesome. I've just pushed manually updated version of the image. I'll probably need to move the build infrastructure somewhere else.

tuukka commented 3 years ago

Thanks! Do you think the existing instance can be unstuck simply by updating the container, and can the updating be done simply by removing the existing container and starting a new image with the same settings?

wiktorn commented 3 years ago

If you go this way it will be unstucked, but you may loose some data.

If I were in your place, I'd:

check the logs /db/changes.log for the last time the update succeeded. The message should be Update complete. And then look for previous message Using given sequence which should be followed by /app/bin/update_database call. Note the ID in Using given sequence message.
replace the value in /db/replicate_id with the value you noted above
remove the file /db/diffs/changes.osm
[optional] update the container version

Regardless of the version of the container, it should un-stuck the updates and restart the updates in the place where it previously stuck.

If you update the container version first, you may loose some of the updates (as you might skip some of the files).

The other approach is to use ID from past, that you are sure that was OK, as reapplying the same changes again in overpass database is not a problem.

mgd722 commented 3 years ago

I ran into this same problem and the steps listed in @wiktorn's comment above me solved it. To update the container version do I just do a regular docker container update <container_name>?

mgd722 commented 3 years ago

Also as update_database is catching back up I'm getting a bunch of lines like this in the console:

Way 12310172 has changed at timestamp 2021-03-24T08:05:17Z in two different diffs.

Is this a normal thing?

EDIT: @mmd-osm on the OSM Slack group says:

As long as you don't see something like: compute_idx_and_geometry: Node 8556363556 used in way 4429331 not found or Exception occurred: Bad geometry for way 298733619, this should be ok. It's only an INFO kind of message. ...if you see those messages I mentioned, it's a sure sign your DB is corrupted, and you need to start from scratch.

wiktorn commented 3 years ago

I ran into this same problem and the steps listed in @wiktorn's comment above me solved it. To update the container version do I just do a regular docker container update <container_name>?

AFAIR docker container update will not change the base image reference. What you need is to remove the container and create a new one. If you keep the mount of database folder and keep the config the same, you will just start running new version of app.

Also as update_database is catching back up I'm getting a bunch of lines like this in the console:
Way 12310172 has changed at timestamp 2021-03-24T08:05:17Z in two different diffs.
Is this a normal thing?

EDIT: @mmd-osm on the OSM Slack group says:

As long as you don't see something like: compute_idx_and_geometry: Node 8556363556 used in way 4429331 not found or Exception occurred: Bad geometry for way 298733619, this should be ok. It's only an INFO kind of message. ...if you see those messages I mentioned, it's a sure sign your DB is corrupted, and you need to start from scratch.

Yes, it looks like. I'd only mention, that messages like compute_geometry: Way 290449146 used in relation 60199 not found are normal, if you're working with extract and not full planet dump.

mmd-osm commented 3 years ago

Yes, it looks like. I'd only mention, that messages like compute_geometry: Way 290449146 used in relation 60199 not found are normal, if you're working with extract and not full planet dump.

Yes, that's correct, those messages would also occur on extracts. On a planet db, they're usually fatal.

wiktorn / Overpass-API

Updating stuck at XML parsing error #69