Normalized/canonical form for returned geometries?

theroggy commented 5 months ago

It was noticed in PR https://github.com/shapely/shapely/pull/1964 that (some?) functions most of the time don't seem to return results in "canonical form", at least for some aspects of it.

Is there an intention to return geometries in canonical form, or should the user use normalize if he wants a geometry to have a predictable form?

import shapely

# Tests if input is counter clockwise
p1_orig = shapely.Polygon([(0, 100), (0, 0), (100, 0), (100, 100)])
p1_set_prec = shapely.set_precision(p1_orig, grid_size=1e-3)
p1_orig_norm = shapely.normalize(p1_orig)
p1_set_prec_norm = shapely.normalize(p1_set_prec)
p1_inters_orig = shapely.intersection(p1_orig, p1_orig)
p1_inters_set_prec = shapely.intersection(p1_set_prec, p1_set_prec)
p1_inters_orig_norm = shapely.normalize(p1_inters_orig)

print(f"{p1_orig=} (counter-clockwise)")
print(f"{p1_set_prec=} (clockwise)")
print(f"{p1_orig_norm=} (clockwise)")
print(f"{p1_set_prec_norm=} (clockwise)")
print(f"{p1_inters_orig=} (clockwise)")
print(f"{p1_inters_set_prec=} (clockwise)")
print(f"{p1_inters_orig_norm=} (clockwise)")

# Tests if input is clockwise
p2_orig = shapely.Polygon([(0, 100), (100, 100), (100, 0), (0, 0)])
p2_inters_orig = shapely.intersection(p2_orig, p2_orig)

print(f"{p2_orig=} (clockwise)")
print(f"{p2_inters_orig=} (clockwise)")

# Tests on ordering of geometrycollections
l1_orig = shapely.LineString([(200, 100), (200, 0), (300, 0), (300, 100)])
c1_orig = shapely.GeometryCollection([p1_orig, l1_orig])
c1_inters = c1_orig.intersection(c1_orig.envelope)
print(f"c1_inters={c1_inters.wkt}")
c1_inters_norm = c1_inters.normalize()
print(f"c1_inters_norm={c1_inters_norm.wkt}")
c1_norm = c1_orig.normalize()
print(f"c1_norm={c1_norm.wkt}")

Output:

p1_orig=<POLYGON ((0 100, 0 0, 100 0, 100 100, 0 100))> (counter-clockwise)
p1_set_prec=<POLYGON ((100 100, 100 0, 0 0, 0 100, 100 100))> (clockwise)
p1_orig_norm=<POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0))> (clockwise)
p1_set_prec_norm=<POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0))> (clockwise)
p1_inters_orig=<POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))> (clockwise)
p1_inters_set_prec=<POLYGON ((100 0, 0 0, 0 100, 100 100, 100 0))> (clockwise)
p1_inters_orig_norm=<POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0))> (clockwise)
p2_orig=<POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))> (clockwise)
p2_inters_orig=<POLYGON ((100 100, 100 0, 0 0, 0 100, 100 100))> (clockwise)
c1_inters=GEOMETRYCOLLECTION (LINESTRING (200 100, 200 0), LINESTRING (200 0, 300 0), LINESTRING (300 0, 300 100), POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100)))
c1_inters_norm=GEOMETRYCOLLECTION (POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0)), LINESTRING (300 0, 300 100), LINESTRING (200 0, 300 0), LINESTRING (200 0, 200 100))
c1_norm=GEOMETRYCOLLECTION (POLYGON ((0 0, 0 100, 100 100, 100 0, 0 0)), LINESTRING (200 100, 200 0, 300 0, 300 100))

dr-jts commented 5 months ago

Is there an intention to return geometries in canonical form, or should the user use normalize if he wants a geometry to have a predictable form?

Geometries are returned in "mild" canonical form. In particular, polygon outputs have a consistent orientation for their rings (CW for shells, CCW for holes - which is opposite to more recent convention, but that's another issue). Some other functions preserve line orientation (which can result in "non-normalized" output. Otherwise, further normalization is avoided in order to maximise performance.

If strict normalization is required it needs to be performed explicitly by the caller.

The setPrecision operation is actually using the overlay code under the hood, which is why the output always has canonical orientation (which may be different to the input). I can see some justification for preserving the orientation of the input (although I'm curious why that is important?).

Although, given the fact that precision reduction can change the topology of a geometry dramatically, I'm not sure how technically feasible this is, or what performance would look like.

DOSull commented 5 months ago

I submitted the original report in the shapely repo that led to this issue report.

My requirement is not that orientation be preserved but that the form in which polygons are returned from various operations be consistent, predictable, and documented. No surprises! @theroggy's further investigations led to uncovering the various confusing seeming inconsistencies in behaviour they have reported in this issue. If normalize is required after setPrecision to guarantee known formatting of polygon vertices, then that's OK for my application, so long as it is well documented.

I certainly don't suggest that performance to be compromised in pursuit of the needs of an obscure requirement.

FWIW the application is one where I need to examine polygon symmetries, algorithms for which misbehave if the orientation and start vertex of a polygon's coordinate sequence changes unexpectedly after a step like setting precision which might naively be expected to have no side effects actually changes the start vertex and/or orientation of a polygon.

theroggy commented 4 months ago

I (tried to) document the "mild" versus "strict" canonical forms and when they are applied in this shapely PR: https://github.com/shapely/shapely/pull/1964/files

pramsey commented 4 months ago

I wouldn't count on "mild" canonical form having any particular permanence, as changes in the practical implementations of overlay code can and will change it over time. It's a "current state" thing, not a matter of any formal definition we do or will follow. The output of normalize should remain stable over time, modulo changes or additions that might be forced on us by things we forgot to normalize initially, or discovered had other effects (the normalized form of a closed linestring it the one that haunts my dreams).

theroggy commented 4 months ago

I wouldn't count on "mild" canonical form having any particular permanence, as changes in the practical implementations of overlay code can and will change it over time. It's a "current state" thing, not a matter of any formal definition we do or will follow. The output of normalize should remain stable over time, modulo changes or additions that might be forced on us by things we forgot to normalize initially, or discovered had other effects (the normalized form of a closed linestring it the one that haunts my dreams).

OK, thanks, I'll make it explicit in the documentation that the "Mild" canonical form is just the current state of affairs but that it is not to be counted on and is likely to change in the future.

libgeos / geos

Normalized/canonical form for returned geometries? #1032