libgeos / geos

Geometry Engine, Open Source
https://libgeos.org
GNU Lesser General Public License v2.1
1.19k stars 357 forks source link

TopologyException on union of valid geometries #1170

Open theroggy opened 2 weeks ago

theroggy commented 2 weeks ago

Doing a union of a combination of 4 specific geometries leads to a TopologyException: Ring edge missing at ... .

They aren't the prettiest geometries, and they overlap... but they are all valid, so...

Script to reproduce (using shapely 2.0.6 with geos 3.13)

import shapely

wkts = [
    "MULTIPOLYGON (((39866.13 197473.47, 39868.73410547314 197475.00383519547, 39866.12902312726 197473.4659950994, 39863.954506024966 197472.3081521314, 39863.95342412875 197472.3101418713, 39866.13 197473.47)))",
    "MULTIPOLYGON (((39885.956031143665 197478.0640111044, 39885.95897514373 197479.21402710304, 39885.950035633316 197481.70725623498, 39885.96 197480.94, 39885.98995120308 197476.58709181842, 39885.96601513773 197477.38100310415, 39885.956031143665 197478.0640111044)))",
    "MULTIPOLYGON (((39900 197488.44505065124, 39891.23 197484.05000000002, 39885.96 197480.94, 39885.950000000004 197481.71, 39886.26 197483.45, 39886.88 197485.1, 39885.96 197485.15, 39866.13 197473.47, 39863.95342412875 197472.3101418713, 39861.427414675134 197476.9557832219, 39866.1 197479.7, 39880.69 197487.58000000002, 39895.6 197494.96, 39900 197488.44505065124)))",
    "MULTIPOLYGON (((39866.875 197477.625, 39874.625 197481.125, 39882.625 197481.375, 39886.625 197479.625, 39886.875 197476.625, 39893.875 197462.125, 39900 197426.20833333334, 39899.875 197426.125, 39896.90894452482 197411.75769223337, 39900 197406.21523013702, 39900 197400, 39820.57741178792 197500, 39848.89740965763 197500, 39861.4824150813 197476.8546307259, 39866.875 197477.625), (39891.46428777711 197421.7142877771, 39891.625 197448.375, 39873.625 197462.625, 39872.007923812 197457.49692859003, 39862.259636527146 197475.4252251145, 39895.67580000311 197413.96880000085, 39891.46428777711 197421.7142877771)))",
]

polys = shapely.from_wkt(wkts)
print(shapely.is_valid(polys))

shapely.union_all(polys)
dr-jts commented 2 weeks ago

This is an issue in JTS as well. It's a robustness issue in the overlay code.

No fix at the moment. It does work using Snap Rounding, with a fairly large scale factor (10^7). Not sure if that's available in Shapely.

theroggy commented 2 weeks ago

It does work using Snap Rounding, with a fairly large scale factor (10^7). Not sure if that's available in Shapely.

I don't immediately find anything regarding snaprounding in the shapely API, nor in the GEOS C API, so I suppose not.

I have 4 more cases that also lead to "TopologyException: Ring edge missing at...", and 3 that give a "TopologyException: found non-noded intersection between ..." error.

Is it useful to find the exact culprits for those cases as well (no problem to do so, it takes about 5 to 10 minutes per case to seperate them from the 3 million input polygons in total) or is it very likely that they all lead back to the same issue?

dr-jts commented 2 weeks ago

It does work using Snap Rounding, with a fairly large scale factor (10^7). Not sure if that's available in Shapely.

I don't immediately find anything regarding snaprounding in the shapely API, nor in the GEOS C API, so I suppose not.

It looks like the Shapely overlay functions accept a gridSize parameter.

dr-jts commented 2 weeks ago

Is it useful to find the exact culprits for those cases as well (no problem to do so, it takes about 5 to 10 minutes per case to seperate them from the 3 million input polygons in total) or is it very likely that they all lead back to the same issue?

It would be interesting to see some other failure cases. They may have different causes, which may (or may not) be easier to fix.

theroggy commented 2 weeks ago

OK, I further automated the discovery of the extra cases... each .csv contains some WKTs that when unioned trigger an error.

error_6.csv error_0.csv error_1.csv error_2.csv error_3.csv error_4.csv error_5.csv

theroggy commented 2 weeks ago

It looks like the Shapely overlay functions accept a gridSize parameter.

Ah, oops, ok... grid_size is synonymous for snap-rounding... I should have known, sounds logical (blushing).

dr-jts commented 2 weeks ago

OK, I further automated the discovery of the extra cases... each .csv contains some WKTs that when unioned trigger an error.

Thanks. I confirm I see the errors as well. I can't offer any fast resolution for this, unfortunately. But you might try either using snap-rounding (via specifying a grid size in Shapely), or doing some amount of precision reduction on the coordinate values (which will likely move coordinates enough to avoid these problems.

Since you're using Shapely, can you catch the error and use snap-rounding only on the cases which cause the error? (It may be that this strategy can be built into the Unary Union code itself - but I haven't figured out yet how to do this in a way that won't reduce performance across the board.)

theroggy commented 2 weeks ago

I was actually allready applying gridsize=0.01 (the data is projected, so 0.01 meter is still plenty precise), but I'm kind of using geopandas, and in geopandas the gridsize parameter isn't exposed yet... so I was just using set_precision after doing the unary_union... which obviously doesn't help to avoid this error.

Because of your advice that using snap-rounding/gridsize in the unary_union can avoid the error, I now made a version where the gridsize parameter does trickle down so it is directly applied in the unary_union, and this solves all of the above issues.

Thanks a lot already!

dr-jts commented 2 weeks ago

Because of your advice that using snap-rounding/gridsize in the unary_union can avoid the error, I now made a version where the gridsize parameter does trickle down so it is directly applied in the unary_union, and this solves all of the above issues.

Thanks a lot already!

Nice! Glad that helped.

I'll keep thinking about a more fundamental fix that will solve this problem for all users of JTS and GEOS.

dr-jts commented 2 weeks ago

The fundamental problem is that it can happen that unioning two geometries produces a result which has invalid polygonal elements. This probably happens due to almost coincident linework causing the overlay topology determination to be incorrect.

Here is a small reproducer example:

A

POLYGON ((39893.875 197462.125, 39900 197400, 39820.57741178792 197500, 39848.89740965763 197500, 39893.875 197462.125), (39891.625 197448.375, 39872.007923812 197457.49692859003, 39862.259636527146 197475.4252251145, 39895.67580000311 197413.96880000085, 39891.625 197448.375))

B

POLYGON ((39866.13 197473.47, 39868.73410547314 197475.00383519547, 39866.12902312726 197473.4659950994, 39863.954506024966 197472.3081521314, 39863.95342412875 197472.3101418713, 39866.13 197473.47))
image

Input, showing very narrow B polygon touching a gore in A

image

Magnified Topology of input

Currently the union of A and B produces:

GEOMETRYCOLLECTION (POLYGON ((39900 197400, 39820.57741178792 197500, 39848.89740965763 197500, 39893.875 197462.125, 39900 197400), (39872.007923812 197457.49692859003, 39895.67580000311 197413.96880000085, 39891.625 197448.375, 39872.007923812 197457.49692859003)), POLYGON ((39863.95356604543 197472.3098808691, 39863.95342412875 197472.3101418713, 39866.13 197473.47, 39868.73410547314 197475.00383519547, 39866.12902312726 197473.4659950994, 39863.954506024966 197472.3081521314, 39863.95356604543 197472.3098808691)), LINESTRING (39863.95342412875 197472.3101418713, 39862.259636527146 197475.4252251145))
image

Result containing a nested polygon element

The polygonal elements in this geometry overlap, so their combination is an invalid MultiPolygon. CascadedUnion then unions that with further geometries. This produces a TopologyException due to the invalid input.

Possible Fix

A fix is to have CascadedUnion ensure that polygonal inputs extracted from mixed GeometryCollections are valid, and if not to run GeometryFixer.fix on them. This might not cause too much performance degradation, since mixed-dimension GCs should be relatively rare.

dr-jts commented 1 week ago

This is similar to #942. And like that situation, this can't be solved by simple area or envelope check heuristics (since the results lie within the inputs).