osm2pgsql-dev / osm2pgsql

OpenStreetMap data to PostgreSQL converter
https://osm2pgsql.org
GNU General Public License v2.0
1.51k stars 475 forks source link

Segfault happening #2265

Open joto opened 1 month ago

joto commented 1 month ago

See https://gist.github.com/tomhughes/0ebdc537b6a9a390b904d394d796b5e0

@pnorman Please add all the details about what you were doing here.

What version of osm2pgsql are you using?

2.0.0+ds-1~bpo12+1

What operating system and PostgreSQL/PostGIS version are you using?

Debian 12

Tell us something about your system

OpenStack Virtual Machine with 64 x 1 core AMD EPYC Processor, 444GB

What did you do exactly?

export LUA_PATH='/srv/vector.openstreetmap.org/osm2pgsql-themepark/lua/?.lua;/srv/vector.openstreetmap.org/spirit/?.lua;;'

# Import the osm2pgsql file specified as an argument, using the locations for spirit
osm2pgsql \
  --output flex \
  --style '/srv/vector.openstreetmap.org/spirit/shortbread.lua' \
  --slim \
  --flat-nodes '/srv/vector.openstreetmap.org/data/nodes.bin' \
  -d spirit \
  --cache 75000 data.pbf

What did you expect to happen?

The planet to import.

What did happen instead?

A core dump occurred in the post-processing stage

What did you do to try analyzing the problem?

@tomhughes provided the backtrace at https://gist.github.com/tomhughes/0ebdc537b6a9a390b904d394d796b5e0

The machine had an unusual postgresql setup at the time. When starting the import it had sufficient connection slots for osm2pgsql to start, but by the time post-processing had started it would not be possible to acquire new connections. When I fixed this error in the machine configuration it worked fine.

joto commented 1 month ago

Strange. The backtrace does not fit with the idea that the database connection failed. I am not sure that's what the problem was. What was this "unusual postgresql setup"? I tried limiting the number of connections and osm2pgsql properly reported an error and exited.

pnorman commented 1 month ago

What was this "unusual postgresql setup"?

Too few max_connections. I fixed it by properly stopping chef and the replication process.

joto commented 1 month ago

I am pretty sure it is not the max_connections which caused this. At least when I try that it seems to work correctly. I'd rather expect some race condition that went one way in the first try and the other in the seconds. Is there any chance you can re-create the situation with the too small max_connection and try again? At the moment I don't know how to debug this until we get a reproducible way to trigger the bug.

pnorman commented 1 month ago

I can see if I have a suitable system