locationtech / jts

The JTS Topology Suite is a Java library for creating and manipulating vector geometry.
Other
1.94k stars 441 forks source link

Polygonizer.getGeometry() results are non-deterministic when extractOnlyPolygonal=true #1063

Open MartinJericho opened 2 months ago

MartinJericho commented 2 months ago

Polygonizer implementation seems to hold some internal state that affects the output of the getGeometry() method when extractOnlyPolygonal=true.

The attached unit test class demonstrates the issue. Because of the nature of this bug, the assertions may fail depending on the initial internal state of the library on the testing machine. But the assertions pass in my test environment.

JTSBugsTest.txt

Doing the following multiple times yields different results:

If the input linework intersects along joint edges, and Polygonizer produces an invalid geometry, the result is different every time, even after normalizing the result. If the input linework intersects only at points, in which case Polygonizer always produces a valid geometry, the result may be different every time, but seems to be deterministic after normalising. If the input linework does not intersect, Polygonizer seems to produce the same result every time, even without normalising.

Note that something as simple as calling new GeometryFactory().createPolygon().isValid() affects the internal state and changes the result of the next call to Polygonizer.getGeometry(). This probably also indicates the presence of a bug in the Geometry.isValid() method, as you would never expect it to modify the state of anything at all.

JTS version 1.19.0

petebankhead commented 2 weeks ago

I also encountered problems with non-deterministic behavior in Polygonizer.

I think the issue is at least partly due to PlanarGraph storing edges and dirEdges in a HashSet.

Replacing this with a LinkedHashSet made the behavior much more stable in my case. I guess a TreeSet with appropriate Comparator might also help, but I don't understand the method enough to make a guess which is preferable.

This is the rather ugly code I used to test for stability with a difficult (random) example.

public static void main(String[] args) {
        Random rng = new Random(100);
        PrecisionModel pm = new PrecisionModel(100.0);
        GeometryFactory factory = new GeometryFactory(pm);

        // Create an array of n random coordinates
        // Note that increasing this to 1000 will result in a TopologyValidationError
        int n = 100;
        Coordinate[] coords = new Coordinate[n];
        for (int i = 0; i < n-1; i++) {
            Coordinate c = new Coordinate(rng.nextDouble() * 1000, rng.nextDouble() * 1000);
            pm.makePrecise(c);
            coords[i] = c;
        }
        coords[coords.length-1] = coords[0];

        // Loop 100 times to check for stability
        Geometry lineString = factory.createLineString(coords).union();
        Geometry lastGeometry = null;
        for (int k = 0; k < 100; k++) {
            Polygonizer polygonizer = new Polygonizer(true);
            polygonizer.add(lineString);
            Geometry polygons = polygonizer.getGeometry();
            TopologyValidationError err = new IsValidOp(polygons).getValidationError();
            if (err != null) {
                System.err.println("Error detection! " + err);
            }
            if (lastGeometry != null) {
                if (!lastGeometry.equalsExact(polygons)) {
                    throw new TopologyException("Polygonization is not stable");
                }
            }
            assert polygons.isValid();
            lastGeometry = polygons;
        }
    }
dr-jts commented 1 week ago

Sometimes requiring stable output from an algorithm can reduce performance. This may or may not be one of those cases - it will require some testing to find out.

Is it important to have stable results from the Polygonizer? Because if not, then the simple way to resolve this is to remove the non-determinism in the unit test, by normalizing the output.

MartinJericho commented 1 week ago

Normalising the result does not remove the non-deterministic behaviour. Please check the method testJTSBugs1() in the unit test attached to the initial post. The variation includes the complete inclusion/exclusion of polygons with joined edges, resulting in an invalid geometry.

Also a noteworthy and bizarre fact is that calling .isValid() on a completely unrelated geometry affects the subsequent behaviour of Polygonizer. I even recall once that a call to a logging library completely unrelated to JTS affected the subsequent behaviour of Polygonizer.

Polygonizer is seriously broken and unusable with the extractOnlyPolygonal option enabled.

petebankhead commented 1 week ago

I second @MartinJericho's observation that the behavior changes unpredictably, and this is the case even when provided the exact same normalized input (i.e. the same Geometry object).

I do think there is a relationship with hashing, which may explain the observation that even seemingly unrelated calls have an impact.

Specifically, DirectedEdge.hashCode() is not implemented; PlanarGraph stores these in a HashSet, and the iterator is used twice in PolygonizeGraph. So the ordering could matter, and the default hash value can be influenced by other things.

Investigating this, I could get deterministic behavior in my case by doing either of the following:

  1. Using protected Set dirEdges = new LinkedHashSet() instead of HashSet, or
  2. Implementing DirectedEdge.hashCode()

Certainly the first seems preferable to relying on HashSet to be predictable. I'd hope there is no major performance penalty from such a minor change.

dr-jts commented 1 week ago

I now see that the test case in JTSBugsTest.txt causes Polygonizer to create an invalid self-touching MultiPolygon, even with the extractOnlyPolygonal flag set. So this is clearly a bug. If this can be fixed by eliminating the non-determinism in the algorithm, then that's a win-win.