Open lonvia opened 2 years ago
@lonvia I would be interested in working on this issue. Do you think someone new to the Nominatim project could handle this issue?
It's challenging but doable if you are not afraid to dive head-deep into pl/PgSQL code.
Here are some pointers to get you started:
location_area_large_*
tables. These tables contain chopped up geometries, so you can't use them. There is already a hack to load full geometries in the function here. That's the way to go.I recommend to start with writing a BDD test that describes the problem from #2649 and initially fails. Feel free to make a PR with the test only, if you want to have a second opinion if the test tests the right thing.
@lonvia I have written a test case, but I'm having trouble running some of the tests - please see #2697
^ @lonvia See the PR above for my first crack at the test case
Tried this in https://github.com/lonvia/Nominatim/tree/drop-outside-address-parts-backup and it turns out to be far too slow to check for containment. This first needs a clever idea how to reduce the number of geometry checks that have to be made.
One possible idea to reduce the number of checks: only start checking for containment after there have been two boundaries for the same level. This should catch the common case like #3537 where a road crosses between boundaries with different sub-divisions.
2082 has implemented filtering of address parts, where the POI is outside the area of the address part. This solves the issue for bad addresses when the parent street goes through multiple administrative area in most cases. For performance reasons we do this only, when multiple areas of the same address level are part of the address list of a street. This leaves some corner cases, where there is only one area of a certain address level that is partially touched by the street, see discussion in #2649.
Instead of guessing, which areas need rechecking, it would also be possible to explicitly mark areas that need rechecking in the
place_addressline
table. This would catch that case and make the rechecking a bit easier in general.