nickodell commented 2 years ago

Problem description

I'm running into an issue on an 3 month old overpass server, where the updates have stopped applying. Based on the logs, it seems to be caused by this error:

127.0.0.1 - - [14/Jan/2022:17:13:28 +0000] "GET /api/interpreter?data=[out:json];node(1);out; HTTP/1.1" 200 663 "-" "curl/7.64.0"
/db/diffs/changes.osm exists. Trying to apply again.
127.0.0.1 - - [14/Jan/2022:17:13:58 +0000] "GET /api/interpreter?data=[out:json];node(1);out; HTTP/1.1" 200 663 "-" "curl/7.64.0"
/app/bin/update_from_dir --osc-dir=/db/diffs --version=2022-01-14T09:06:39Z --flush-size=16 --meta
Reading XML file ... finished reading nodes. Flushing to database ...... done.
Reading XML file ... finished reading ways. Flushing to database .....127.0.0.1 - - [14/Jan/2022:17:14:29 +0000] "GET /api/interpreter?data=[out:json];node(1);out; HTTP/1.1" 200 663 "-" "curl/7.64.0"
. done.
Reading XML file ... finished reading relations. Flushing to database ...File error caught: /db/db/relations.bin Block_Backend: an item's size exceeds block size.

This error first occurred for me this morning, about 14/Jan/2022 09:07, UTC time. I used this command to find the first occurrence of this in my logs: docker logs overpass_world | grep -C 5 "exceeds block size" | head

After this first occurrence, the error happens again every minute.

Is anyone else running into this issue? If it's a problem with the data, I'd expect it to happen to everyone at the same time.

Previous reports of issue

While searching this issue, I found these related reports:

Hello all

I was trying to setup local overpass api server. When executing init._osm3s.sh script, following message showed up:
...File error caught: 0 Success ..ways.bin Block_Backend: an item's size exceeds block size.

https://github.com/drolbr/Overpass-API/issues/111

The problem in that case seems to have been related to the data being imported - but it's hard to believe that the planet minutely updates would break overpass. In particular, the public overpass mirrors I checked are still up-to-date.

Version information & configuration

Command used:

docker run
   -e OVERPASS_META=yes
   -e OVERPASS_MODE=clone
   -e OVERPASS_DIFF_URL=https://planet.openstreetmap.org/replication/minute/
   -e OVERPASS_STOP_AFTER_INIT=false
   -v /overpass/:/db
   -p 80:80
   -i -t -d
   --name overpass_world
   wiktorn/overpass-api:latest

Container environment variables:

OVERPASS_STOP_AFTER_INIT=false
OVERPASS_META=yes
OVERPASS_MODE=clone
OVERPASS_DIFF_URL=https://planet.openstreetmap.org/replication/minute/
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
NGINX_VERSION=1.18.0
NJS_VERSION=0.4.4
PKG_RELEASE=2~buster
OVERPASS_RULES_LOAD=1
OVERPASS_USE_AREAS=true

Image ID: e5981e56de8b

vshcherb commented 2 years ago

Same issue, reimporting database doesn't help

Flushing to database .......File error caught: /mnt/home-ssdd/overpass/db/relations_attic.bin Block_Backend: an item's size exceeds block size.

Waiting to download a new database tomorrow night.

nickodell commented 2 years ago

@vshcherb Can you run docker logs <container name> | grep -C 5 "exceeds block size" | head ? I'm wondering if it happened at the same time.

vshcherb commented 2 years ago

well I don't have docker we're running in jenkins, we are running minutes updates of overpass it happened https://builder.osmand.net:8080/view/OSM%20Live/job/OsmLive_FetchAndUpdateOverpass/592751/console Somewhere between 09:02 - 09:10 UTC

More specific one of the changesets: 004887746 004887747 004887748 004887749

wiktorn commented 2 years ago

Thank you @vshcherb for the list of possible changesets. I took a peek around them and it looks that I found an offender in changeset 004887749.

This relation has (within this changeset) 220k members, out of which most are nodes, for Antillas Menores / Lesser Antilles.

This was obviously a mistake and the object is now removed. To move forward with the updates of Overpass instance I recommend changing /db/diffs/changes.osm file within container and remove the offending relation from changeset file. This way, this changeset willl be processed (apart from creating this relation in index), and you can expect - a warning, when changeset will be removing this object (though I'm not sure, whether this will happen at all).

If you're creating your instance from scratch, newer daily (from evening 2022-01-14 and further on) extracts should already not contain this object, so this error should not happen.

mgd722 commented 2 years ago

i foolishly skipped the offending diff entirely in a rush to get my server back online, and now i'm getting problems because future diffs are referencing data from that diff i skipped. i tried resetting my replicate_id back to just before the offending diff in hopes that i could pick it up and then return to pulling current diffs, but now i keep getting:

2022-01-17 17:46:05 DEBUG: Downloaded change 4888021. (-87 kB available in download buffer)
/app/bin/update_database --version=2022-01-14T13:42:42Z --compression-method=gz --map-compression-method=gz --flush-size=16 --keep-attic --db-dir=/db/db
Context error: File /db/db//osm3s_v0.7.55_osm_base present, which indicates a running dispatcher. Delete file if no dispatcher is running.
cat: write error: Broken pipe
/db/diffs/changes.osm exists. Trying to apply again.

i'm not sure what to do at this point and i am trying to avoid scrapping the whole container and starting over. any thoughts?

wiktorn commented 2 years ago

@mgd722: stop the container and verify that /db/db//osm3s_v0.7.55_osm_base within container does not exists. Then start again the container. Also double check, that you don't have two containers running with the same /db folder.

Rewinding replicate_id back to old value, downloading the diff, then removing offending change from it and applying this + future changes looks like a good strategy.

mmd-osm commented 2 years ago

I’d highly recommend to report such issues on the overpass api repo, otherwise there’s zero visibility into what’s going on.

Issue https://github.com/drolbr/Overpass-API/issues/111 is too old and very likely not relevant anymore.

Rewinding replicate_id back to old value, downloading the diff, then removing offending change from it and applying this + future changes looks like a good strategy.

Most likely you will encounter further issues down the road. In particular this will not work with attic databases. Starting fresh is usually the better approach.

wiktorn / Overpass-API

File error caught: ... Block_Backend: an item's size exceeds block size. #83

Problem description

Previous reports of issue

Version information & configuration