Open MartinJericho opened 4 months ago
I also encountered problems with non-deterministic behavior in Polygonizer
.
I think the issue is at least partly due to PlanarGraph
storing edges
and dirEdges
in a HashSet
.
Replacing this with a LinkedHashSet
made the behavior much more stable in my case. I guess a TreeSet
with appropriate Comparator
might also help, but I don't understand the method enough to make a guess which is preferable.
This is the rather ugly code I used to test for stability with a difficult (random) example.
public static void main(String[] args) {
Random rng = new Random(100);
PrecisionModel pm = new PrecisionModel(100.0);
GeometryFactory factory = new GeometryFactory(pm);
// Create an array of n random coordinates
// Note that increasing this to 1000 will result in a TopologyValidationError
int n = 100;
Coordinate[] coords = new Coordinate[n];
for (int i = 0; i < n-1; i++) {
Coordinate c = new Coordinate(rng.nextDouble() * 1000, rng.nextDouble() * 1000);
pm.makePrecise(c);
coords[i] = c;
}
coords[coords.length-1] = coords[0];
// Loop 100 times to check for stability
Geometry lineString = factory.createLineString(coords).union();
Geometry lastGeometry = null;
for (int k = 0; k < 100; k++) {
Polygonizer polygonizer = new Polygonizer(true);
polygonizer.add(lineString);
Geometry polygons = polygonizer.getGeometry();
TopologyValidationError err = new IsValidOp(polygons).getValidationError();
if (err != null) {
System.err.println("Error detection! " + err);
}
if (lastGeometry != null) {
if (!lastGeometry.equalsExact(polygons)) {
throw new TopologyException("Polygonization is not stable");
}
}
assert polygons.isValid();
lastGeometry = polygons;
}
}
Sometimes requiring stable output from an algorithm can reduce performance. This may or may not be one of those cases - it will require some testing to find out.
Is it important to have stable results from the Polygonizer
? Because if not, then the simple way to resolve this is to remove the non-determinism in the unit test, by normalizing the output.
Normalising the result does not remove the non-deterministic behaviour. Please check the method testJTSBugs1() in the unit test attached to the initial post. The variation includes the complete inclusion/exclusion of polygons with joined edges, resulting in an invalid geometry.
Also a noteworthy and bizarre fact is that calling .isValid() on a completely unrelated geometry affects the subsequent behaviour of Polygonizer. I even recall once that a call to a logging library completely unrelated to JTS affected the subsequent behaviour of Polygonizer.
Polygonizer is seriously broken and unusable with the extractOnlyPolygonal option enabled.
I second @MartinJericho's observation that the behavior changes unpredictably, and this is the case even when provided the exact same normalized input (i.e. the same Geometry
object).
I do think there is a relationship with hashing, which may explain the observation that even seemingly unrelated calls have an impact.
Specifically, DirectedEdge.hashCode()
is not implemented; PlanarGraph
stores these in a HashSet
, and the iterator is used twice in PolygonizeGraph
. So the ordering could matter, and the default hash value can be influenced by other things.
Investigating this, I could get deterministic behavior in my case by doing either of the following:
protected Set dirEdges = new LinkedHashSet()
instead of HashSet
, orDirectedEdge.hashCode()
Certainly the first seems preferable to relying on HashSet
to be predictable. I'd hope there is no major performance penalty from such a minor change.
I now see that the test case in JTSBugsTest.txt
causes Polygonizer
to create an invalid self-touching MultiPolygon, even with the extractOnlyPolygonal
flag set. So this is clearly a bug. If this can be fixed by eliminating the non-determinism in the algorithm, then that's a win-win.
Polygonizer implementation seems to hold some internal state that affects the output of the getGeometry() method when extractOnlyPolygonal=true.
The attached unit test class demonstrates the issue. Because of the nature of this bug, the assertions may fail depending on the initial internal state of the library on the testing machine. But the assertions pass in my test environment.
JTSBugsTest.txt
Doing the following multiple times yields different results:
If the input linework intersects along joint edges, and Polygonizer produces an invalid geometry, the result is different every time, even after normalizing the result. If the input linework intersects only at points, in which case Polygonizer always produces a valid geometry, the result may be different every time, but seems to be deterministic after normalising. If the input linework does not intersect, Polygonizer seems to produce the same result every time, even without normalising.
Note that something as simple as calling
new GeometryFactory().createPolygon().isValid()
affects the internal state and changes the result of the next call to Polygonizer.getGeometry(). This probably also indicates the presence of a bug in the Geometry.isValid() method, as you would never expect it to modify the state of anything at all.JTS version 1.19.0