osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
2.98k stars 701 forks source link

Nominatim updates are stuck after "Processed relations" #3445

Closed ArvyRogerio closed 2 weeks ago

ArvyRogerio commented 2 weeks ago

Describe the bug No docker (self host). Since 2 weeks ago, "nominatim replication --once" is not completing. Tried to full reinstall, same results. First server was up +1 year, update every 2:00 AM. Postgres appears to be in loop (COPY). More than 24 hours running. Only one country (Brazil).

2024-06-16 16:57:45: Using project directory: /dados/nominatim/nominatim-planet 2024-06-16 16:58:20 osm2pgsql version 1.11.0 2024-06-16 16:58:20 Database version: 15.6 (Debian 15.6-0+deb12u1) 2024-06-16 16:58:20 PostGIS version: 3.3 2024-06-16 16:58:20 Loading properties from table '"public"."osm2pgsql_properties"'. 2024-06-16 16:58:20 Not using flat node file (same as on import). 2024-06-16 16:58:20 Using prefix 'planet_osm' (same as on import). 2024-06-16 16:58:20 Using style file '/usr/local/etc/nominatim/import-extratags.lua' (same as on import). 2024-06-16 17:02:45 Reading input files done in 265s (4m 25s). 2024-06-16 17:02:45 Processed 201681 nodes in 56s - 4k/s 2024-06-16 17:02:45 Processed 40896 ways in 188s (3m 8s) - 218/s 2024-06-16 17:02:45 Processed 4733 relations in 21s - 225/s (Control+C) KeyboardInterrupt

Postgres 15:

3351778 postgres 87.6 6.7 218:32.43 postgres: 15/main: nominatim nominatim 127.0.0.1(54044) COPY (for more than 24 Hours - needed to restart postgres)

To Reproduce nominatim replication --once

Software Environment (please complete the following information):

Hardware Configuration (please complete the following information):

ArvyRogerio commented 2 weeks ago

Script:

#!/bin/bash

NOMINATIM_REPLICATION_URL="http://download.geofabrik.de/south-america/brazil-updates"
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900

cd /dados/nominatim/nominatim-planet
nominatim replication --once
mtmail commented 2 weeks ago

Can you post relevant lines from the postgresql server log? -> we're looking for errors, warnings, out-of-memory or such.

What does the query SELECT * FROM pg_stat_activity print? (pg_top is also a nice tool but shows less information). -> We're looking for possible queries that block updates.

What is the output of nominatim replication --check-for-updates? -> To check how many days the database is behind.

ArvyRogerio commented 2 weeks ago

pSQL, nothing in special, only when I stopped the script and restarted it.

2024-06-16 17:49:26.154 -03 [3275581] LOG: checkpoint complete: wrote 168 buffers (0.1%); 0 WAL file(s) added, 1 removed, 19 recycled; write=212.440 s, sync=6.113 s, total=224.596 s; sync files=35, longest=0.877 s, average=0.175 s; distance=336736 kB, estimate=437388 kB 2024-06-16 17:49:57.452 -03 [3275579] LOG: received fast shutdown request 2024-06-16 17:49:57.701 -03 [3275579] LOG: aborting any active transactions 2024-06-16 17:49:57.709 -03 [3351778] nominatim@nominatim FATAL: terminating connection due to administrator command 2024-06-16 17:49:57.709 -03 [3351778] nominatim@nominatim CONTEXT: SQL statement "update placex set indexed_status = 2 where indexed_status = 0 and rank_search > NEW.rank_search and ST_DWithin(placex.geometry, NEW.geometry, diameter)" PL/pgSQL function placex_insert() line 99 at SQL statement SQL statement "INSERT INTO placex (osm_type, osm_id, class, type, name, admin_level, address, extratags, geometry) VALUES (NEW.osm_type, NEW.osm_id, NEW.class, NEW.type, NEW.name, NEW.admin_level, NEW.address, NEW.extratags, NEW.geometry)" PL/pgSQL function place_insert() line 188 at SQL statement COPY place, line 319: "W 6176613 highway residential 15 "name"=>"Woodhouse Cliff" \N \N 0102000020E610000005000000753348669..." 2024-06-16 17:49:57.709 -03 [3351778] nominatim@nominatim STATEMENT: COPY "public"."place" ("osm_type","osm_id","class","type","admin_level","name","address","extratags","geometry") FROM STDIN 2024-06-16 17:49:57.709 -03 [3348231] nominatim@nominatim FATAL: terminating connection due to administrator command 2024-06-16 17:49:57.709 -03 [3348234] nominatim@nominatim FATAL: terminating connection due to administrator command 2024-06-16 17:49:57.926 -03 [3275579] LOG: background worker "logical replication launcher" (PID 3275586) exited with exit code 1 2024-06-16 17:49:58.115 -03 [3275581] LOG: shutting down 2024-06-16 17:49:58.180 -03 [3275581] LOG: checkpoint starting: shutdown immediate 2024-06-16 17:50:04.192 -03 [3275581] LOG: checkpoint complete: wrote 405 buffers (0.2%); 0 WAL file(s) added, 3 removed, 7 recycled; write=1.865 s, sync=3.610 s, total=6.077 s; sync files=33, longest=0.675 s, average=0.110 s; distance=165306 kB, estimate=410180 kB 2024-06-16 17:50:04.706 -03 [3275579] LOG: database system is shut down

ArvyRogerio commented 2 weeks ago

nominatim@server:~/nominatim-planet$ nominatim replication --check-for-updates 2024-06-16 21:06:11: Using project directory: /dados/nominatim/nominatim-planet 2024-06-16 21:06:12: New data available (6133423 => 6136455).

ArvyRogerio commented 2 weeks ago

SELECT * FROM pg_stat_activity No records.

lonvia commented 2 weeks ago

This is an issue with the recent vandalism on OpenStreetMap. To work around this:

Further note that the vandalism issues are only present in the minutely and hourly diffs of planet replication. The daily diffs from the planet and Geofabrik should not be affected. Your setup is configured wrongly and consumes planet-wide diffs. In particular, in your bash script you need to set export NOMINATIM_REPLICATION_URL="http://download.geofabrik.de/south-america/brazil-updates". Without the 'export', the setting will not be picked up by the nominatim executable.

ArvyRogerio commented 2 weeks ago

Hi @lonvia, thanks, I'll retry today and get back to you.

BTW, I found the "old server" script log (that was running +1 year fine). That was a PG 13.

2024-06-14 02:31:01: Using project directory: /srv/nominatim 2024-06-14 02:32:47 osm2pgsql version 1.5.1 2024-06-14 02:32:47 Database version: 13.14 (Debian 13.14-0+deb11u1) 2024-06-14 02:32:47 PostGIS version: 3.1 2024-06-14 02:32:47 Parsing gazetteer style file '/usr/local/etc/nominatim/import-extratags.style'. Processing: Node(150k 150.0k/s) Way(0k 0.00k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(1k 0.25k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(2k 0.40k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(4k 0.67k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(6k 0.86k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(8k 1.00k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(11k 1.22k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(18k 1.80k/s) Relation(0 0.0/s) Processing: Node(228k 9.5k/s) Way(30k 1.15k/s) Relation(10 10.0/s) Processing: Node(228k 9.5k/s) Way(30k 1.15k/s) Relation(60 60.0/s) Processing: Node(228k 9.5k/s) Way(30k 1.15k/s) Relation(1000 500.0/s) > 2024-06-14 02:33:50 Reading input files done in 63s (1m 3s). 2024-06-14 02:33:50 Processed 228543 nodes in 24s - 10k/s 2024-06-14 02:33:50 Processed 30306 ways in 26s - 1k/s 2024-06-14 02:33:50 Processed 1862 relations in 13s - 143/s 2024-06-15 01:58:49 ERROR: DB copy thread failed: Ending COPY mode for 'place' failed: FATAL: terminating connection due to administrator command CONTEXT: SQL statement "update placex set indexed_status = 2 where indexed_status = 0 and rank_search > NEW.rank_search and ST_DWithin(placex.geometry, NEW.geometry, diameter)" PL/pgSQL function placex_insert() line 96 at SQL statement SQL statement "insert into placex (osm_type, osm_id, class, type, name, admin_level, address, extratags, geometry) values (NEW.osm_type, NEW.osm_id, NEW.class, NEW.type, NEW.name, NEW.admin_level, NEW.address, NEW.extratags, NEW.geometry)" PL/pgSQL function place_insert() line 166 at SQL statement COPY place, line 243: "4430214 W highway residential "name"=>"Agar Street" 15 \N "motor_vehicle"=>"destination","surface"=>..." . 2024-06-15 01:58:49 Done postprocessing on table 'planet_osm_nodes' in 0s 2024-06-15 01:58:49 Done postprocessing on table 'planet_osm_ways' in 0s 2024-06-15 01:58:49 Done postprocessing on table 'planet_osm_rels' in 0s Traceback (most recent call last): File "/usr/local/bin/nominatim", line 11, in exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module', File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 235, in nominatim return parser.run(**kwargs) File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 96, in run return args.command.run(args) File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/replication.py", line 187, in run UpdateReplication._update(args) File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/replication.py", line 144, in _update state = replication.update(conn, params) File "/usr/local/lib/nominatim/lib-python/nominatim/tools/replication.py", line 119, in update run_osm2pgsql(options) File "/usr/local/lib/nominatim/lib-python/nominatim/tools/exec_utils.py", line 139, in run_osm2pgsql subprocess.run(cmd, cwd=options.get('cwd', '.'), File "/usr/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/usr/local/lib/nominatim/osm2pgsql', '--hstore', '--latlon', '--slim', '--with-forward-dependencies', 'false', '--log-progress', 'true', '--number-processes', '1', '--cache', '2000', '--output', 'gazetteer', '--style', '/usr/local/etc/nominatim/import-extratags.style', '--append', '/srv/nominatim/osmosischange.osc']' returned non-zero exit status 2.

driade commented 2 weeks ago

Hi, hello.

Same behaviour here with the Spain and Germany maps since yesterday update at 4 AM. The Portugal update also had some problems but the machines returned to a good state after a few hours.

I'm using https://github.com/mediagis/nominatim-docker

lonvia commented 2 weeks ago

Same remedy if the geofabrik daily diffs cause issues. Stop updates, kill copy (or simply restart postgres), run catch-up with a very large replication size.

ArvyRogerio commented 2 weeks ago

@lonvia question: I need to run "NOMINATIM_REPLICATION_MAX_DIFF=10000 nominatim replication --catch-up" only once and, after completed, run my daily-script normally (adding "export" in the 3 constants)?

"NOMINATIM_REPLICATION_MAX_DIFF=10000 nominatim replication --catch-up" is just to fix this or I need to run more times?

My need is update once per day.

lonvia commented 2 weeks ago

The "catch-up" ensures that you jump over the problematic edits in the updates. It only needs to be done once and then updates can continue as usual. (Until the next time it gets stuck, that is. The vandalism is sadly still not completely contained.)

As for your use of planet updates: I suggest that you reimport the Brasil extract and start again from there. The year of applying planet updates will have added a lot of garbage to your database.

ArvyRogerio commented 2 weeks ago

@lonvia thanks, I did't know about the "export" in the daily script... so, basically I can drop the database, redo the:

nominatim import --osm-file <path>/brazil-latest.osm.pbf

and then just run daily:

!/bin/bash

export NOMINATIM_REPLICATION_URL="http://download.geofabrik.de/south-america/brazil-updates" export NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400 export NOMINATIM_REPLICATION_RECHECK_INTERVAL=3600

cd /dados/nominatim/nominatim-planet nominatim replication --once

It's ok then? I only will use Brazil data.

mtmail commented 2 weeks ago

@ArvyRogerio That looks good.

ArvyRogerio commented 2 weeks ago

Done. Re-imported and ran the update script. For now, looks fine. BTW, no more errors. Thanks a lot for your help. I'll monitor for some days.

nominatim@server:~$ ./nominatim-updater 2024-06-18 15:05:57: Using project directory: /dados/nominatim/nominatim-planet 2024-06-18 15:06:03: Update completed. Import: 0:00:00. Total: 0:00:00. Remaining backlog: 2 days, 0:46:31.