osm-search / Nominatim

Open Source search based on OpenStreetMap data
https://nominatim.org
GNU General Public License v3.0
3.09k stars 712 forks source link

Better filtering for irrelevant address parts for POIs #2658

Open lonvia opened 2 years ago

lonvia commented 2 years ago

2082 has implemented filtering of address parts, where the POI is outside the area of the address part. This solves the issue for bad addresses when the parent street goes through multiple administrative area in most cases. For performance reasons we do this only, when multiple areas of the same address level are part of the address list of a street. This leaves some corner cases, where there is only one area of a certain address level that is partially touched by the street, see discussion in #2649.

Instead of guessing, which areas need rechecking, it would also be possible to explicitly mark areas that need rechecking in the place_addressline table. This would catch that case and make the rechecking a bit easier in general.

champagne-cmd commented 2 years ago

@lonvia I would be interested in working on this issue. Do you think someone new to the Nominatim project could handle this issue?

lonvia commented 2 years ago

It's challenging but doable if you are not afraid to dive head-deep into pl/PgSQL code.

Here are some pointers to get you started:

I recommend to start with writing a BDD test that describes the problem from #2649 and initially fails. Feel free to make a PR with the test only, if you want to have a second opinion if the test tests the right thing.

champagne-cmd commented 2 years ago

@lonvia I have written a test case, but I'm having trouble running some of the tests - please see #2697

champagne-cmd commented 2 years ago

^ @lonvia See the PR above for my first crack at the test case

lonvia commented 1 year ago

Tried this in https://github.com/lonvia/Nominatim/tree/drop-outside-address-parts-backup and it turns out to be far too slow to check for containment. This first needs a clever idea how to reduce the number of geometry checks that have to be made.

lonvia commented 5 days ago

One possible idea to reduce the number of checks: only start checking for containment after there have been two boundaries for the same level. This should catch the common case like #3537 where a road crosses between boundaries with different sub-divisions.