osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.08k stars 712 forks source link

Replication does not index all the records #2392

Closed rcarroll0452 closed 3 years ago

rcarroll0452 commented 3 years ago

Hi,

We just did a full planet import a couple of weeks before and commenced daily updates using nominatim replication.

We noticed that even though the updates are completed, a few thousand records remain unindexed after every day's update for which we have to manually run nominatim index

So in the last one week, everyday, once the daily updates goes into sleep mode, I kill them (using Ctrl+C), and I run nominatim index two times one after the other to fully index all records and then restart the daily updates for the next day using nominatim replication.

Details from log below:

After daily update completed today, the output of nominatim admin --check-database command:

nominatim@server:~/nominatim-project$ nominatim admin --check-database
2021-07-13 11:39:27: Using project directory: /home/nominatim/nominatim-project
2021-07-13 11:39:27: Checking database
Checking database connection ... OK
Checking for placex table ... OK
Checking for placex content ... OK
Checking that tokenizer works ... OK
Checking indexing status ... Failed
The indexing didn't finish. 5048 entries are not yet indexed.

To index the remaining entries, run:   nominatim index

Checking that database indexes are complete ... OK
Checking that all database indexes are valid ... OK
Checking TIGER external data table. ... OK

First run nominatim index command (redacted info for Done 0/0 ranks except interpolation lines & rank 30),

2021-07-13 11:46:19: Done 2/2 in 1 @ 1.389 per second - FINISHED boundaries rank 16                      
2021-07-13 11:46:25: Done 2/2 in 1 @ 1.038 per second - FINISHED boundaries rank 20                      
2021-07-13 11:46:34: Done 1/1 in 1 @ 0.847 per second - FINISHED rank 12                                 
2021-07-13 11:46:36: Done 36/36 in 1 @ 24.513 per second - FINISHED rank 16                              
2021-07-13 11:46:37: Done 18/18 in 1 @ 13.527 per second - FINISHED rank 18                              
2021-07-13 11:46:38: Done 37/37 in 1 @ 30.399 per second - FINISHED rank 20                              
2021-07-13 11:46:40: Done 12/12 in 1 @ 9.658 per second - FINISHED rank 22                               
2021-07-13 11:46:41: Done 2/2 in 1 @ 1.639 per second - FINISHED rank 25                                 
2021-07-13 11:46:45: Done 4938/4938 in 4 @ 1129.963 per second - FINISHED rank 0                         
2021-07-13 11:46:46: Done 0/0 in 0 @ 0.000 per second - FINISHED interpolation lines (location_property_osmline)
2021-07-13 11:46:46: Done 0/0 in 0 @ 0.000 per second - FINISHED rank 30                

The output of nominatim admin --check-database command (redacted info since every other checks returns OK as above):

Checking indexing status ... Failed                                                                      
The indexing didn't finish. 52 entries are not yet indexed.                                              

To index the remaining entries, run:   nominatim index                                                   

Second run nominatim index command (redacted info for Done 0/0 ranks except rank 29, interpolation lines & rank 30),

2021-07-13 11:47:05: Done 0/0 in 0 @ 0.000 per second - FINISHED rank 29
2021-07-13 11:47:06: Done 52/52 in 1 @ 44.201 per second - FINISHED rank 0
2021-07-13 11:47:07: Done 0/0 in 0 @ 0.000 per second - FINISHED interpolation lines (location_property_osmline)
2021-07-13 11:47:07: Done 0/0 in 0 @ 0.000 per second - FINISHED rank 30

The output of nominatim admin --check-database command (redacted info since every other checks returns OK as above):

Checking indexing status ... OK

Between all these, a SELECT statement as below to PSQL to obtain the count of unindexed records matches the numbers above

SELECT count(place_id) FROM placex WHERE indexed_status=2;

We also noticed that during update, while it indexes, it completes upto rank 29, then starts for rank 0, then interpolation_lines before going to rank 30

2021-07-13 11:12:13: Done 0/0 in 0 @ 0.000 per second - FINISHED rank 29
2021-07-13 11:15:07: Done 14220/14220 in 173 @ 81.817 per second - FINISHED rank 0
2021-07-13 11:15:17: Done 2639/2639 in 7 @ 343.925 per second - FINISHED interpolation lines (location_property_osmline)
2021-07-13 11:35:53: Done 203366/203366 in 1233 @ 164.816 per second - FINISHED rank 30

Software Environment (please complete the following information):

Hardware Configuration (please complete the following information):

I'll be happy to provide any further details if required

lonvia commented 3 years ago

This is entirely normal and expected. While processing the changes from OSM, Nominatim notices and marks a couple of dependent objects which have not been directly modified but should be reprocessed nonetheless because they depend on a changed object. They can't be processed directly because order of objects matters when indexing. So they get marked as to-be-indexed.

This isn't really a problem when running Nominatim in minutely update mode. The dependent objects will be processed a minute later during the next run. It might not be ideal when running updates only once a day or week.

rcarroll0452 commented 3 years ago

Thank you for the response :)