mediagis / nominatim-docker

100% working container for Nominatim
Creative Commons Zero v1.0 Universal
1.07k stars 437 forks source link

psycopg2.OperationalError: server closed the connection unexpectedly #500

Open echo292 opened 9 months ago

echo292 commented 9 months ago

Describe the bug Initial container creation from docker image returns error after running for a while. Something might be timing out. The north america pbf is a larger file at 12gb. 2023-11-08 01:07:43 psycopg2.OperationalError: server closed the connection unexpectedly

To Reproduce Steps to reproduce the behavior:

  1. Ran the following command:

docker run -it --shm-size=60g -e PBF_URL=https://download.geofabrik.de/north-america-latest.osm.pbf -e IMPORT_WIKIPEDIA=true -e IMPORT_US_POSTCODES=true -e IMPORT_TIGER_ADDRESSES=false -p 8080:8080 --name nominatimNorthAmerica mediagis/nominatim:4.3

  1. See error

2023-11-08 01:07:30 2023-11-08 09:07:30: Done 27322100 in 3651 @ 7481.536 per second - rank 30 ETA (seconds): 2204.75 2023-11-08 01:07:31 2023-11-08 09:07:31: Done 27329580 in 3652 @ 7481.704 per second - rank 30 ETA (seconds): 2203.70 2023-11-08 01:07:32 2023-11-08 09:07:32: Done 27337060 in 3653 @ 7481.461 per second - rank 30 ETA (seconds): 2202.78 2023-11-08 01:07:33 2023-11-08 09:07:33: Done 27344540 in 3655 @ 7481.357 per second - rank 30 ETA (seconds): 2201.81 2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27352020 in 3656 @ 7481.101 per second - rank 30 ETA (seconds): 2200.88 2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27359500 in 3656 @ 7482.438 per second - rank 30 ETA (seconds): 2199.49 2023-11-08 01:07:35 2023-11-08 09:07:35: Done 27366980 in 3656 @ 7483.752 per second - rank 30 ETA (seconds): 2198.10 2023-11-08 01:07:36 2023-11-08 09:07:36: Done 27374460 in 3657 @ 7484.661 per second - rank 30 ETA (seconds): 2196.84 2023-11-08 01:07:36 2023-11-08 09:07:36: Done 27381960 in 3658 @ 7485.282 per second - rank 30 ETA (seconds): 2195.65 2023-11-08 01:07:38 2023-11-08 09:07:38: Done 27389440 in 3659 @ 7485.196 per second - rank 30 ETA (seconds): 2194.68 2023-11-08 01:07:39 2023-11-08 09:07:39: Done 27396920 in 3660 @ 7484.945 per second - rank 30 ETA (seconds): 2193.75 2023-11-08 01:07:40 2023-11-08 09:07:40: Done 27404400 in 3661 @ 7484.936 per second - rank 30 ETA (seconds): 2192.76 2023-11-08 01:07:41 2023-11-08 09:07:41: Done 27411880 in 3662 @ 7484.948 per second - rank 30 ETA (seconds): 2191.75 2023-11-08 01:07:42 2023-11-08 09:07:42: Done 27419380 in 3663 @ 7484.888 per second - rank 30 ETA (seconds): 2190.77 2023-11-08 01:07:43 2023-11-08 09:07:43: Done 27426860 in 3664 @ 7485.145 per second - rank 30 ETA (seconds): 2189.69 2023-11-08 01:07:43 2023-11-08 09:07:43: Done 27434340 in 3664 @ 7486.064 per second - rank 30 ETA (seconds): 2188.43 2023-11-08 01:07:43 Traceback (most recent call last): 2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/indexer/indexer.py", line 234, in _index 2023-11-08 01:07:43 runner.index_places(pool.next_free_worker(), part) 2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 201, in next_free_worker 2023-11-08 01:07:43 return next(self.free_workers) 2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 209, in _yield_free_worker 2023-11-08 01:07:43 if thread.is_done(): 2023-11-08 01:07:43 File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 159, in is_done 2023-11-08 01:07:43 if self.conn.poll() == psycopg2.extensions.POLL_OK: 2023-11-08 01:07:43 psycopg2.OperationalError: server closed the connection unexpectedly 2023-11-08 01:07:43 This probably means the server terminated abnormally 2023-11-08 01:07:43 before or while processing the request.

Expected behavior I expected the Nominatim server to complete this command step.

Screenshots & Logs If applicable, add screenshots & logs to help explain your problem.

Desktop / Server (please complete the following information): OS & Version: [Windows 11 Pro 23H2] Docker Version: [Server: Docker Desktop 4.25.0 (126437)] Nominatim Version: [4.3.1]

Additional context Here are the workstation specs: i9 14900k 192GB of memory 4 TB M.2 SSD

I found this info about the postgresql timing out. I can't find these connection parameters in any of the basic Nominatim server settings.

https://www.tomatolist.com/show_blog_page.html?no=9b5e7549-e3d3-484e-9259-7b53432f9f26

conn = psycopg2.connect(database="DBname", host="Your_IP_address", user="username", password="Your_password", port=5432, keepalives=1, keepalives_idle=130, keepalives_interval=10, keepalives_count=15)

Thank you,

-E

echo292 commented 9 months ago

Here are the Postgresql logs.

Part 1

2023-11-08 06:42:33.270 UTC [55] LOG:  checkpoints are occurring too frequently (8 seconds apart)
2023-11-08 06:42:33.270 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:42:38.309 UTC [55] LOG:  checkpoints are occurring too frequently (5 seconds apart)
2023-11-08 06:42:38.309 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:42:41.471 UTC [208] ERROR:  canceling autovacuum task
2023-11-08 06:42:41.471 UTC [208] CONTEXT:  while scanning block 9152001 of relation "public.planet_osm_nodes"
    automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:42:43.330 UTC [443] ERROR:  canceling autovacuum task
2023-11-08 06:42:43.330 UTC [443] CONTEXT:  while scanning block 2420922 of relation "public.place"
    automatic vacuum of table "nominatim.public.place"
2023-11-08 06:42:44.485 UTC [370] ERROR:  canceling autovacuum task
2023-11-08 06:42:44.485 UTC [370] CONTEXT:  while scanning block 750676 of relation "public.planet_osm_ways"
    automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:42:46.654 UTC [447] ERROR:  canceling autovacuum task
2023-11-08 06:42:46.654 UTC [447] CONTEXT:  while scanning block 13507 of relation "public.placex"
    automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:42:47.968 UTC [447] ERROR:  canceling autovacuum task
2023-11-08 06:42:47.968 UTC [447] CONTEXT:  while scanning block 5163 of relation "public.location_property_osmline"
    automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:46:45.517 UTC [456] ERROR:  canceling autovacuum task
2023-11-08 06:46:45.517 UTC [456] CONTEXT:  while scanning block 398686 of relation "public.planet_osm_nodes"
    automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:46:47.296 UTC [459] ERROR:  canceling autovacuum task
2023-11-08 06:46:47.296 UTC [459] CONTEXT:  while scanning block 2582094 of relation "public.place"
    automatic vacuum of table "nominatim.public.place"
2023-11-08 06:46:48.445 UTC [462] ERROR:  canceling autovacuum task
2023-11-08 06:46:48.445 UTC [462] CONTEXT:  while scanning block 786985 of relation "public.planet_osm_ways"
    automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:46:50.652 UTC [456] ERROR:  canceling autovacuum task
2023-11-08 06:46:50.652 UTC [456] CONTEXT:  while scanning block 22030 of relation "public.placex"
    automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:46:52.008 UTC [459] ERROR:  canceling autovacuum task
2023-11-08 06:46:52.008 UTC [459] CONTEXT:  while scanning block 14075 of relation "public.location_property_osmline"
    automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:47:24.987 UTC [55] LOG:  checkpoints are occurring too frequently (15 seconds apart)
2023-11-08 06:47:24.987 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 06:47:32.862 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:32.862 UTC [716] CONTEXT:  while scanning block 440701 of relation "public.planet_osm_nodes"
    automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:47:34.635 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:34.635 UTC [716] CONTEXT:  while scanning block 2589145 of relation "public.place"
    automatic vacuum of table "nominatim.public.place"
2023-11-08 06:47:35.784 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:35.784 UTC [716] CONTEXT:  while scanning block 791191 of relation "public.planet_osm_ways"
    automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:47:37.955 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:37.955 UTC [716] CONTEXT:  while scanning block 26012 of relation "public.placex"
    automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:47:41.666 UTC [716] ERROR:  canceling autovacuum task
2023-11-08 06:47:41.666 UTC [716] CONTEXT:  while scanning block 18797 of relation "public.location_property_osmline"
    automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:50:31.721 UTC [58] LOG:  using stale statistics instead of current ones because stats collector is not responding
2023-11-08 06:51:26.744 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:26.744 UTC [1108] CONTEXT:  while scanning block 857934 of relation "public.planet_osm_nodes"
    automatic vacuum of table "nominatim.public.planet_osm_nodes"
2023-11-08 06:51:28.715 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:28.715 UTC [1210] CONTEXT:  while scanning block 878300 of relation "public.planet_osm_ways"
    automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 06:51:30.909 UTC [1245] ERROR:  canceling autovacuum task
2023-11-08 06:51:30.909 UTC [1245] CONTEXT:  while scanning block 95895 of relation "public.placex"
    automatic vacuum of table "nominatim.public.placex"
2023-11-08 06:51:32.177 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:32.177 UTC [1108] CONTEXT:  while scanning block 8089 of relation "public.place_addressline"
    automatic vacuum of table "nominatim.public.place_addressline"
2023-11-08 06:51:33.681 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:33.681 UTC [1210] CONTEXT:  while scanning block 9512 of relation "public.search_name_2"
    automatic vacuum of table "nominatim.public.search_name_2"
2023-11-08 06:51:36.609 UTC [1108] ERROR:  canceling autovacuum task
2023-11-08 06:51:36.609 UTC [1108] CONTEXT:  while scanning block 11042 of relation "public.location_area_large_2"
    automatic vacuum of table "nominatim.public.location_area_large_2"
2023-11-08 06:51:37.813 UTC [1245] ERROR:  canceling autovacuum task
2023-11-08 06:51:37.813 UTC [1245] CONTEXT:  while scanning block 23382 of relation "public.location_property_osmline"
    automatic vacuum of table "nominatim.public.location_property_osmline"
2023-11-08 06:51:38.945 UTC [1210] ERROR:  canceling autovacuum task
2023-11-08 06:51:38.945 UTC [1210] CONTEXT:  while scanning block 8205 of relation "public.search_name"
    automatic vacuum of table "nominatim.public.search_name"
2023-11-08 07:16:14.562 UTC [55] LOG:  checkpoints are occurring too frequently (29 seconds apart)
2023-11-08 07:16:14.562 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 07:29:51.700 UTC [55] LOG:  checkpoints are occurring too frequently (28 seconds apart)
2023-11-08 07:29:51.700 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 07:37:04.314 UTC [55] LOG:  checkpoints are occurring too frequently (29 seconds apart)
2023-11-08 07:37:04.314 UTC [55] HINT:  Consider increasing the configuration parameter "max_wal_size".
*** stack smashing detected ***: terminated
2023-11-08 09:07:43.713 UTC [53] LOG:  server process (PID 8240) was terminated by signal 6: Aborted
2023-11-08 09:07:43.713 UTC [53] DETAIL:  Failed process was running:  UPDATE placex
                    SET indexed_status = 0, address = v.addr, token_info = v.ti,
                        name = v.name, linked_place_id = v.linked_place_id
                    FROM (VALUES (23803236, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803247, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803270, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803318, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803379, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803387, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803453, hstore(ARRAY['name'], ARRAY['Cherry School'])::hstore, NULL::hstore, NULL::int, '{"names": "{7281851,15536,64307}"}'::jsonb),(23803458, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803469, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803475, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803489, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(23803514, NULL::hstore, NULL::hstore, NULL::int, '{}'::jsonb),(2
2023-11-08 09:07:43.714 UTC [53] LOG:  terminating any other active server processes
2023-11-08 09:07:43.775 UTC [53] LOG:  all server processes terminated; reinitializing
2023-11-08 09:07:43.908 UTC [8279] LOG:  database system was interrupted; last known up at 2023-11-08 09:06:35 UTC
2023-11-08 09:07:43.910 UTC [8279] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 09:07:43.912 UTC [8279] LOG:  redo starts at 5B/8B004358
2023-11-08 09:07:50.464 UTC [8279] LOG:  redo done at 5B/C9361F90 system usage: CPU: user: 2.51 s, system: 1.83 s, elapsed: 6.55 s
2023-11-08 09:07:50.879 UTC [53] LOG:  database system is ready to accept connections
2023-11-08 16:06:28.749 UTC [45] LOG:  starting PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on IPv6 address "::", port 5432
2023-11-08 16:06:28.749 UTC [45] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-08 16:06:28.758 UTC [46] LOG:  database system was interrupted; last known up at 2023-11-08 09:07:50 UTC
2023-11-08 16:06:28.760 UTC [46] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 16:06:28.761 UTC [46] LOG:  redo starts at 5B/C93620C0
2023-11-08 16:06:28.761 UTC [46] LOG:  invalid record length at 5B/C93620F8: wanted 24, got 0
2023-11-08 16:06:28.761 UTC [46] LOG:  redo done at 5B/C93620C0 system usage: CPU: user: 0.00 s, 

Part 2

2023-11-08 16:58:31.057 UTC [47] LOG:  checkpoints are occurring too frequently (4 seconds apart)
2023-11-08 16:58:31.057 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:58:53.171 UTC [47] LOG:  checkpoints are occurring too frequently (22 seconds apart)
2023-11-08 16:58:53.171 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:33.156 UTC [47] LOG:  checkpoints are occurring too frequently (3 seconds apart)
2023-11-08 16:59:33.156 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:35.445 UTC [47] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2023-11-08 16:59:35.445 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:37.759 UTC [47] LOG:  checkpoints are occurring too frequently (2 seconds apart)
2023-11-08 16:59:37.759 UTC [47] HINT:  Consider increasing the configuration parameter "max_wal_size".
2023-11-08 16:59:39.547 UTC [272] ERROR:  canceling autovacuum task
2023-11-08 16:59:39.547 UTC [272] CONTEXT:  while scanning block 737428 of relation "public.planet_osm_ways"
    automatic vacuum of table "nominatim.public.planet_osm_ways"
2023-11-08 17:26:43.161 UTC [55] LOG:  starting PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on IPv6 address "::", port 5432
2023-11-08 17:26:43.161 UTC [55] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-11-08 17:26:43.162 UTC [56] LOG:  database system was interrupted; last known up at 2023-11-08 16:59:45 UTC
2023-11-08 17:26:43.163 UTC [56] LOG:  database system was not properly shut down; automatic recovery in progress
2023-11-08 17:26:43.164 UTC [56] LOG:  redo starts at 93/C9ECCA38
2023-11-08 17:26:47.006 UTC [56] LOG:  redo done at 93/D2E4FFA0 system usage: CPU: user: 0.54 s, system: 1.87 s, elapsed: 3.84 s
2023-11-08 17:26:47.523 UTC [55] LOG:  database system is ready to accept connections
echo292 commented 9 months ago

Could be related. python3-psycopg2 \

https://github.com/psycopg/psycopg2/issues/829

leonardehrenfried commented 9 months ago

These sorts of errors happen when your container doesn't have enough disk or memory, for example https://github.com/mediagis/nominatim-docker/issues/484

I would check how much is assigned to your docker VM.

echo292 commented 9 months ago

Hello, Thank you for the fast response.

For this issue, I'm running Docker in Windows 11 and not in a VM with the workstation configuration listed below.

Should I increase any of the Postgresql settings?

POSTGRES_SHARED_BUFFERS (default: 2GB) POSTGRES_MAINTENANCE_WORK_MEM (default: 10GB) POSTGRES_AUTOVACUUM_WORK_MEM (default: 2GB) POSTGRES_WORK_MEM (default: 50MB) POSTGRES_EFFECTIVE_CACHE_SIZE (default: 24GB) POSTGRES_SYNCHRONOUS_COMMIT (default: off) POSTGRES_MAX_WAL_SIZE (default: 1GB) POSTGRES_CHECKPOINT_TIMEOUT (default: 10min) POSTGRES_CHECKPOINT_COMPLETION_TARGET (default: 0.9)

Desktop / Server (please complete the following information): OS & Version: [Windows 11 Pro 23H2] Docker Version: [Server: Docker Desktop 4.25.0 (126437)] Nominatim Version: [4.3.1]

Additional context Here are the workstation specs: i9 14900k 192GB of memory 4 TB NVME M.2 SSD

Thank you,

-E

echo292 commented 9 months ago

One other note, I don't receive any errors about running out of disk space.

https://github.com/mediagis/nominatim-docker/issues/484 psycopg2.errors.DiskFull: could not resize shared memory segment "/PostgreSQL.2013621802" to 50438144 bytes: No space left on device

leonardehrenfried commented 9 months ago

Docker requires Linux and on windows always runs in a VM even though it's hidden from you.

rmallol commented 2 months ago

Hi all,

I am getting this same error. It looks like setting pool_pre_ping to true may work:

https://stackoverflow.com/questions/55457069/how-to-fix-operationalerror-psycopg2-operationalerror-server-closed-the-conn

Is there any way to change the value for this variable?

Thanks.

mtmail commented 2 months ago

Nominatim doesn't use pooling, or rather it does it's own implementation to reuse connections. If you want to experiement the relevant connection logic is in https://github.com/osm-search/Nominatim/blob/master/nominatim/db/async_connection.py

rmallol commented 2 months ago

Thank you so much for your quick reply. I don't really know how to avoid this error. Any clue?

mtmail commented 2 months ago

This github issue got closed 7 months ago. If you have specific questions about your specific hardwares setup please open a new issue.

rmallol commented 2 months ago

Sorry, but I see this issue as Open. In any case, I have openned a new issue.