Decide on SHACL shape about serializations

situx commented 3 years ago

This issue is to discuss SHACL validation shape number 1.

I encourage everyone to add pro/con arguments to add to this post so that we can make an informed decision. Please edit this post whenever needed:

1 serialization per Geometry instance:

The coordinate reference system is the same
There are no conflicts concerning a differing accuracy a.s.o. because only one serialization per geometry exists

Many serializations per Geometry:

The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)
The serializations could have a differing accuracy

Please add more arguments for either side that I missed

mathib commented 3 years ago

pointer to the earlier discussion between Frans and I.

If shape 1 is rejected in it current form, it will be required to make the following shapes instead:

max one outgoing geo:asWKT relation for each geo:Geometry
max one outgoing geo:asGML relation for each geo:Geometry
max one outgoing geo:asGeoJSON relation for each geo:Geometry
max one outgoing geo:asKML relation for each geo:Geometry
unsure about geo:asDGGS: can different DGGS be in the same SRS and have the same coordinates?

mathib commented 3 years ago

some benefits of shape 1:

The constraint would result in more reliable and simpler query patterns and handling of the query results if you can expect that there's only one serialization
a single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations

The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)

This might not be possible anymore when a SRS/CRS is defined as a node connected to a geo:Geometry instance node. This feature seems to me difficult to ignore when we open GeoSPARQL for a wider variety of geometry serializations (CAD, point clouds, etc). RDF reification is an alternative then but rather cumbersome while RDF* is not (yet?) a standard of its own despite increasing support in tools.

mathib commented 3 years ago

The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)

This also assumes that a perfect lossless transformation of geometry between both CRS/SRS is possible.

We can also keep shape 1 as-is, but lower the "severity" from sh:Violation (default) to sh:Warning (non-critical constraint violation). The shape constraint message can explain that it's allowed to use multiple geometry serializations on one geo:Geometry instance, but it's the publisher/user's responsibility to make sure that the content of the geometry serializations is exactly the same (same coordinates when both are in the same CRS/SRS, or different CRS/SRS supporting a lossless transformation + same metadata applies to all geometry content in different serializations). The examples accompanying the shape can explain the correct and wrong situations

situx commented 3 years ago

I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.

dr-shorthair commented 3 years ago

I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible.

+1, though I might have said 'asking for' rather than 'demanding'

FransKnibbe commented 3 years ago

Many serializations per Geometry:

The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)

The definition of Geometry mandates each serialisation to have the same CRS. So if one serialisation uses GeoJSON, all other serialisations should use CRS84 too (not WGS84, because that uses a different axis order).

The serializations could have a differing accuracy

Accuracy is a property of a Geometry, so all serialisations inherit that accuracy.

FransKnibbe commented 3 years ago

If shape 1 is rejected in it current form, it will be required to make the following shapes instead:

max one outgoing geo:asWKT relation for each geo:Geometry

max one outgoing geo:asGML relation for each geo:Geometry

max one outgoing geo:asGeoJSON relation for each geo:Geometry

max one outgoing geo:asKML relation for each geo:Geometry

Those shapes could be made, but saying they are required is going a bit far. But next to good documentation and examples, such shapes could help users, yes.

unsure about geo:asDGGS: can different DGGS be in the same SRS and have the same coordinates?

When PR 136 is through, a DGSS Object will no longer be a Geometry, making the question inapplicable. `

FransKnibbe commented 3 years ago

some benefits of shape 1:

The constraint would result in more reliable and simpler query patterns and handling of the query results if you can expect that there's only one serialization

I am not convinced querying would be more reliable or simpler (on the contrary, I think), but data publishers are free to use copies of the same Geometry instance with different IRIs and probably an owl:sameAs relationship if they really want to. But in my mind, the most straightforward way of publishing Geometry data is to provide all the serialisation formats a data publishers wants to support per Geometry instance. It is then up to the consumer to request the preferred format.

a single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations

It is up to the makers of GeoSPARQL engines to decide how to load geometries into whatever internal format they want to use. But it would make sense to only extract the CRS and coordinates, and to throw an error if there are inconsistencies between multiple serialisations of the same Geometry. Anyhow, what difference would a SHACL shape that is not part of the core specification make?

FransKnibbe commented 3 years ago

I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.

If 'equivalence' means different CRSs, I believe that is out of the question, according to the definition of geo:Geometry.

dr-shorthair commented 3 years ago

The definition of Geometry mandates each serialisation to have the same CRS.

Really? That looks like something that should be changed then.

FransKnibbe commented 3 years ago

The definition of Geometry mandates each serialisation to have the same CRS.

Really? That looks like something that should be changed then.

Why? The definition in GeoSPARQL is based on GM_Object as defined in ISO-19107. I can´t access that document (paywall), but many replications tell that ¨GM_Object instances are sets of direct positions in a particular coordinate reference system¨. And I think that makes a lot of sense.

dr-shorthair commented 3 years ago

OK - I probably misunderstood - if multiple CRS are desired, then they can be provided in separate geo:Geometry nodes in the context of multiple geo:hasGeometry properties. As long as there is no restriction on the count of hasGeometry then we are fine.

I have access to an old version of ISO 19107:2003. There is a class GM_PointArray which more-or-less corresponds with the actual serialized coordinates.

mathib commented 3 years ago

@FransKnibbe

If shape 1 is rejected in it current form, it will be required to make the following shapes instead:

max one outgoing geo:asWKT relation for each geo:Geometry max one outgoing geo:asGML relation for each geo:Geometry max one outgoing geo:asGeoJSON relation for each geo:Geometry max one outgoing geo:asKML relation for each geo:Geometry

Those shapes could be made, but saying they are required is going a bit far.

If we're asking users to make sure that their geometry serializations are equivalent (same coordinates, same CRS/SRS), what would be the use case that would need to overrule the above described requirement? I can currently only think of a situation when 2 different serializations using the same serialization format have the same coordinates and CRS/SRS but serialized in a different order without changing the represented geometry content (a spatial query would return the exact same result for both). I can't find a direct good use case why people would want that, but maybe you do?

mathib commented 3 years ago

a single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations

It is up to the makers of GeoSPARQL engines to decide how to load geometries into whatever internal format they want to use. But it would make sense to only extract the CRS and coordinates, and to throw an error if there are inconsistencies between multiple serialisations of the same Geometry.

Completely agree

Anyhow, what difference would a SHACL shape that is not part of the core specification make?

Such a shape (with sh:severity equal to sh:Warning instead of the default sh:Violation) can help data consumers, which might be different from the data publishers, to be cautious. Its sh:message can say something like "It seems like multiple serializations exist for this geometry node. A data publisher should assure that the CRS/SRS and the actual coordinates are equivalent for each serialization of the same geometry, but this might not be the case. A single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations. Before using the data, you might want to evaluate if the published data and the operations of the GeoSPARQL engine are correct."

If a data consumer knows it can trust the data publisher and its GeoSPARQL engine with the above, he/she can simply deactivate the shape by asserting sh:deactivated true on the shape

situx commented 3 years ago

Many serializations per Geometry:

The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)

The definition of Geometry mandates each serialisation to have the same CRS. So if one serialisation uses GeoJSON, all other serialisations should use CRS84 too (not WGS84, because that uses a different axis order).

The serializations could have a differing accuracy

Accuracy is a property of a Geometry, so all serialisations inherit that accuracy.

Yes, accuracy is a property of Geometry and all serializations should inherit that accuracy unless a serialization that has a restriction on accuracy is defined. (Maybe only allows accuracy to a certain degree) How do we treat this case if it arises? Or is that unrealistic? The other cases can certainly be checked with some SHACL constraint I would think.

situx commented 3 years ago

I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.

If 'equivalence' means different CRSs, I believe that is out of the question, according to the definition of geo:Geometry.

In that case, we cannot allow defining a GeoJSON literal next to any e.g. WKT literals in a coordinate system different from CRS84. If that is how we want things to be we need to explicitly state this. One might even think about using a different property to indicate such a case? (It is probably quite common to have a representation in world coordinates, right?) So we might define something like geo:hasCRS84Geometry ?geom to make this case very clear? Or is that complicating things?

FransKnibbe commented 3 years ago

I have access to an old version of ISO 19107:2003. There is a class GM_PointArray which more-or-less corresponds with the actual serialized coordinates.

That information could be useful for issue 42.

FransKnibbe commented 3 years ago

If we're asking users to make sure that their geometry serializations are equivalent (same coordinates, same CRS/SRS), what would be the use case that would need to overrule the above described requirement? I can currently only think of a situation when 2 different serializations using the same serialization format have the same coordinates and CRS/SRS but serialized in a different order without changing the represented geometry content (a spatial query would return the exact same result for both). I can't find a direct good use case why people would want that, but maybe you do?

Hi @mathib, I think the proposed shapes are fine. I just wanted to say that GeoSPARQL will be good to use without them, so in that sense they are not required. I did not mean there could be exceptions.

FransKnibbe commented 3 years ago

Yes, accuracy is a property of Geometry and all serializations should inherit that accuracy unless a serialization that has a restriction on accuracy is defined. (Maybe only allows accuracy to a certain degree) How do we treat this case if it arises? Or is that unrealistic? The other cases can certainly be checked with some SHACL constraint I would think.

In general I think serialisation should follow the model and not the other way around. If a serialisation does not allow all model elements to be expressed, than it is an unfit serialisation.

FransKnibbe commented 3 years ago

In that case, we cannot allow defining a GeoJSON literal next to any e.g. WKT literals in a coordinate system different from CRS84. If that is how we want things to be we need to explicitly state this.

It is already stated in the definition of geo:Geometry, isn't it?

One might even think about using a different property to indicate such a case?

It was suggested here.

(It is probably quite common to have a representation in world coordinates, right?)

Yes, but outside of North America CRS84 is rather useless for serious applications. In Europe we are meant to use ETRS89 for geodetic coordinates (degrees longitude and latitude).

jabhay commented 2 years ago

[x] @FransKnibbe to check out whether the Shapes satisfy the remaining correspondence on this issue.

FransKnibbe commented 2 years ago

I have my doubts about mandating at least one serialisation of a Geometry. That would mean this kind of data is wrong:

:myUfo
  a geo:Feature;
  geo:hasGeometry :myGeom1;
  rdfs:label "Something spotted in the sky over Africa" .
:myGeom1
  a geo:Geometry;
  geo:spatialDimension "3"^^xsd:integer;
  geo:within :myAfricaGeom .

In other words: it is possible to state facts about a Geometry when its exact coordinates are (temporarily) unknown.

A more general question: Why don't the Geometry shapes just target all instances of geo:Geometry? It seems easier to define that way, and does not exclude Geometry instances that are not related to a Feature.

paulc-dstl commented 2 years ago

I have my doubts about mandating at least one serialisation of a Geometry.

I do too. I am often dealing with Features where we do not know their spatial position, only their position relative to other Features. Mandating at least one serialisation of a Geometry would preclude this, no?

FransKnibbe commented 2 years ago

I do too. I am often dealing with Features where we do not know their spatial position, only their position relative to other Features. Mandating at least one serialisation of a Geometry would preclude this, no?

In that case, a solution could be not to state anything about geometry. Features, being SpatialObjects, can have topological relationships with other Features.

paulc-dstl commented 2 years ago

In that case, a solution could be not to state anything about geometry. Features, being SpatialObjects, can have topological relationships with other Features.

Of course, thanks! (Retreats back into box...)

jabhay commented 2 years ago

Things to do to complete this ticket

[x] Remove Shape 1a from Annex D, and reorder Shapes under 1 in annex D.

opengeospatial / ogc-geosparql

Decide on SHACL shape about serializations #177