osmcode / libosmium

Fast and flexible C++ library for working with OpenStreetMap data.
https://osmcode.org/libosmium/
Boost Software License 1.0
461 stars 112 forks source link

Throw OSM feature IDs in error logs when complex geometries are behaving recursive #309

Open manoharuss opened 4 years ago

manoharuss commented 4 years ago

https://www.openstreetmap.org/relation/11245090/history Request so that we can debug and maybe fix the geometry complication on OSM with osmiumid r11245090

libosmium v2.15.5 was unable to parse V4 of https://www.openstreetmap.org/relation/11245090/history. It would be good if the failure is more verbose. Many downstream teams are familiar with OpenStreetMap data and can debug and fix on OSM quickly. I am not sure how to best log that from libosmium itself but would be nice to have.

@joto could you also explain the recursion limit in 1 or 2 sentences. https://github.com/osmcode/libosmium/commit/ae649700cdc0fba612252c2e60a38bac608c690b

joto commented 4 years ago

The Osmium library has a facility for reporting problems with multipolygons. This facility isn't necessarily used in all programs using the library, because it will produce huge amounts of debug info that nobody wants to have clutter up their system. You can use the oat_create_areas program and others from https://github.com/osmcode/osm-area-tools to create all sorts of debug outputs. That's what I am using when debugging multipolygon problems. That being said the specific case which happened with r11245090 could be reported in a better way and I am working on an improvement there.

The underlying problem ist this: When Osmium encounters a complex multipolygon, i.e. one where the boundary touches itself, it will switch from a simple algorithm which can't handle the complex case to a different algorithm that will handle any case but has exponential complexity. Basically it will try all combinations of assembling the boundary. This is done with a recursive algorithm and that's where the recursion limit comes from that I not added as a stopgap mechanism. When I wrote all of this many years ago, I decided that this approach, while not being perfect, was okay, because there are very few multipolygons that even need this more expensive approach and I couldn't imagine there ever being something like r11245090 (I still can't imagine why something like this would ever exist :-). Even today there are only about 2700 multipolygons in the planet file which need this more complex algorithm, only about 30 which come close to being problematic and only a single one which is so complex that it breaks Osmium and needs this workaround.

In light of recent development it seems we have to revisit this decision from many years ago and try to find a better algorithm. Until we do, the workaround keeps very very few multipolygons from breaking Osmium but it also means they are not rendered on the map.

At the moment it doesn't make sense to "fix" r11245090. The software simply can't deal with it and while we could "fix" it in a way that would work around the software problem, we'd have to change the geometry to fit the software which isn't a good idea. So for the time being I suggest to just to touch it until we can fix the software issue.

manoharuss commented 4 years ago

thanks. appreciate the explanation @joto