mediagis / nominatim-docker

100% working container for Nominatim
Creative Commons Zero v1.0 Universal
1.09k stars 444 forks source link

Geometry Mismatch on World import in placex table #533

Open aferrar opened 7 months ago

aferrar commented 7 months ago

Describe the bug

While attempting to load world osm data into the placex table of a Nominatim docker installation using an external postgresql database, the process fails due to a mismatch between the SRID of the geometry being loaded and the SRID expected by the column in the table. The error reported is Geometry SRID (5350) does not match column SRID (4326), indicating that the geometries being imported are in a different spatial reference system than what the placex table expects.

Traceback (most recent call last): File "/usr/local/bin/nominatim", line 12, in exit(cli.nominatim(module_dir='/usr/local/lib/nominatim/module', File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 225, in nominatim return get_set_parser().run(**kwargs) File "/usr/local/lib/nominatim/lib-python/nominatim/cli.py", line 121, in run return args.command.run(args) File "/usr/local/lib/nominatim/lib-python/nominatim/clicmd/setup.py", line 121, in run database_import.load_data(args.config.get_libpq_dsn(), num_threads) File "/usr/local/lib/nominatim/lib-python/nominatim/tools/database_import.py", line 222, in load_data conn.wait() File "/usr/local/lib/nominatim/lib-python/nominatim/db/async_connection.py", line 128, in wait wait_select(self.conn) File "/usr/lib/python3/dist-packages/psycopg2/extras.py", line 762, in wait_select state = conn.poll() psycopg2.errors.InvalidParameterValue: Geometry SRID (5350) does not match column SRID (4326) CONTEXT: PL/pgSQL function placex_insert() line 15 at assignment

To Reproduce

docker run -d --name='Nominatim' --net='bridge' -e TZ="America/Time_Zone" -e HOST_OS="Unraid" -e HOST_HOSTNAME="Tower" -e HOST_CONTAINERNAME="Nominatim" -e 'PBF_PATH'='/gis/osm/planet/planet-latest.osm.pbf' <-- Taken from here -e 'NOMINATIM_TOKENIZER'='icu' -e 'NOMINATIM_DATABASE_DSN'='pgsql:dbname=nominatim;host=192.168.1.2;user=postgres;password=password;port=25432' -e 'PGHOST'='192.168.1.2' -e 'PGDATABASE'='nominatim' -e 'PGUSER'='postgres' -e 'PGPASSWORD'='password' -e 'PGPORT'='25432' -e 'REPLICATION_URL'='https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/' -e 'IMPORT_US_POSTCODES'='true' -e 'IMPORT_WIKIPEDIA'='true' -e 'IMPORT_TIGER_ADDRESSES'='true' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:1865]' -l net.unraid.docker.icon='https://avatars.githubusercontent.com/u/19649492?s=48&v=4' -p '1865:8080/tcp' -v '/mnt/user/gis/':'/gis':'rw' -v '/mnt/user/gis/nominatim/flatnode':'/nominatim/flatnode':'rw' --shm-size=64G 'mediagis/nominatim:4.3'

Expected behavior

The expected behavior was for the data to be loaded successfully into the placex table without encountering a SRID mismatch error. Geometries should align with the expected SRID (4326) of the placex table's geometry column.

Screenshots & Logs

Not applicable as no screenshots or additional logs were provided. The error message detailed in the bug report serves as the primary diagnostic output.

Desktop / Server (please complete the following information):

Additional context

The error indicates a likely issue with the data being imported or a possible misconfiguration of the database schema. Further investigation into the source data's SRID and the database schema setup for the spatial_reference table show no existing 5350 SRID.

Update

Thanks to mtmail, we were able to determine the issue was a corrupted pbf file. Thank you, mtmail!

mtmail commented 7 months ago

Can you provide steps to reproduce including how you generated or where you got the data you imported?

Skip the AI generated output please.

aferrar commented 7 months ago

Certainly, thank you for the reply and sorry for the AI stuff. I think this would summarize it nicely, please let me know if I can provide any additional information and thank you again for the support and this awesome project!

Please see below:

docker run -d --name='Nominatim' --net='bridge' -e TZ="America/Time_Zone" -e HOST_OS="Unraid" -e HOST_HOSTNAME="Tower" -e HOST_CONTAINERNAME="Nominatim" -e 'PBF_PATH'='/gis/osm/planet/planet-latest.osm.pbf' <-- Taken from here -e 'NOMINATIM_TOKENIZER'='icu' -e 'NOMINATIM_DATABASE_DSN'='pgsql:dbname=nominatim;host=192.168.1.2;user=postgres;password=password;port=25432' -e 'PGHOST'='192.168.1.2' -e 'PGDATABASE'='nominatim' -e 'PGUSER'='postgres' -e 'PGPASSWORD'='password' -e 'PGPORT'='25432' -e 'REPLICATION_URL'='https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/replication/day/' -e 'IMPORT_US_POSTCODES'='true' -e 'IMPORT_WIKIPEDIA'='true' -e 'IMPORT_TIGER_ADDRESSES'='true' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:1865]' -l net.unraid.docker.icon='https://avatars.githubusercontent.com/u/19649492?s=48&v=4' -p '1865:8080/tcp' -v '/mnt/user/gis/':'/gis':'rw' -v '/mnt/user/gis/nominatim/flatnode':'/nominatim/flatnode':'rw' --shm-size=64G 'mediagis/nominatim:4.3'

mtmail commented 7 months ago

This error shouldn't happen and it's the first time I've seen a projection issue anywhere in Nominatim. What this import step does is read rows from the place table and inserts them into the placex table. In the database there's a trigger attached to the placex table that runs some logic to reformat the place, lookup it's hierarchy. If you noticed that your CPU(s) go to 100% that's the logic that causes this. You can find it in https://github.com/osm-search/Nominatim/blob/master/lib-sql/functions/placex_triggers.sql

But both place and placex table only have geometry columns with SRID 4326 set. You can verify that by looking at the table schemas

\d place
                          Table "public.place"
   Column    |          Type           | Collation | Nullable | Default
-------------+-------------------------+-----------+----------+---------
 osm_type    | character(1)            |           | not null |
 osm_id      | bigint                  |           | not null |
[...]
 geometry    | geometry(Geometry,4326) |           | not null |

\d placex
                             Table "public.placex"
     Column      |            Type             | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+---------
 place_id        | bigint                      |           | not null |
 parent_place_id | bigint                      |           |          |
 linked_place_id | bigint                      |           |          |
[...]
 geometry        | geometry(Geometry,4326)     |           | not null |
[...]
 centroid        | geometry(Geometry,4326)     |           |          |

I think we can rule out that you manually changed the table schema. Does a 5350 SRID even exist? I can't find any information and SELECT * FROM spatial_ref_sys WHERE srid = 5350; returns empty for me.

What do the following queries return for you?

select count(*) from place; -- expect over 300 million
select count(*) from placex; -- unclear, depends where the import stopped

Possible the input data is corrupted. As first step check if the file size matches that on https://planet.openstreetmap.org/pbf/ As second step you can also compare the md5 checksum against the .pbf.md5 file. For example:

md5sum planet-240219.osm.pbf
fa4fce44b6de5b9d9fad9ad6e0bbf354

curl -L https://planet.openstreetmap.org/pbf/planet-240219.osm.pbf.md5
fa4fce44b6de5b9d9fad9ad6e0bbf354  planet-240219.osm.pbf

It's not impossible, just unlikely, that the database is corrupted somehow. I mean the data on disk.

aferrar commented 7 months ago

Thank you for this awesome response. I've checked and 5350 isn't a valid SRID. I ran your count on the place table and returned 0 records. I assume placex will also yield zero. I'm currently running md5sum on my local pbf file and will compare it to the external one I curled from.

It could be the data got corrupted while curling? At this point, after confirming the checksum (and if it's valid), should I just blow away the nominatim db in my postgres instance and just try re-running the server? It took 36 hours to get to this error so it's quite a pain troubleshooting if it'll take the same amount of time on each troubleshooting step.

Thank you again, really appreciate your help!

Edit: Looks like a checksum mismatch so I assume the data got corrupted when I curled. I'll have to figured out what caused that. Is there a recommended curl command I should execute for the pbf download?

image
mtmail commented 7 months ago

Indeed looks like the file got corrupted so best to start complete fresh. You said Monaco (I'm guessing 0.1% of the world) imported fine but I would try another country or US state (1-5% of the world) first before trying the world. While curl or the internet connection wouldn't been at fault it's also possible it's the harddrive.