Closed situx closed 2 years ago
pointer to the earlier discussion between Frans and I.
If shape 1 is rejected in it current form, it will be required to make the following shapes instead:
geo:asWKT
relation for each geo:Geometry
geo:asGML
relation for each geo:Geometry
geo:asGeoJSON
relation for each geo:Geometry
geo:asKML
relation for each geo:Geometry
geo:asDGGS
: can different DGGS be in the same SRS and have the same coordinates?some benefits of shape 1:
The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)
This might not be possible anymore when a SRS/CRS is defined as a node connected to a geo:Geometry
instance node. This feature seems to me difficult to ignore when we open GeoSPARQL for a wider variety of geometry serializations (CAD, point clouds, etc). RDF reification is an alternative then but rather cumbersome while RDF* is not (yet?) a standard of its own despite increasing support in tools.
The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)
This also assumes that a perfect lossless transformation of geometry between both CRS/SRS is possible.
We can also keep shape 1 as-is, but lower the "severity" from sh:Violation
(default) to sh:Warning
(non-critical constraint violation). The shape constraint message can explain that it's allowed to use multiple geometry serializations on one geo:Geometry
instance, but it's the publisher/user's responsibility to make sure that the content of the geometry serializations is exactly the same (same coordinates when both are in the same CRS/SRS, or different CRS/SRS supporting a lossless transformation + same metadata applies to all geometry content in different serializations). The examples accompanying the shape can explain the correct and wrong situations
I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.
I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible.
+1, though I might have said 'asking for' rather than 'demanding'
Many serializations per Geometry:
- The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)
The definition of Geometry
mandates each serialisation to have the same CRS. So if one serialisation uses GeoJSON, all other serialisations should use CRS84 too (not WGS84, because that uses a different axis order).
- The serializations could have a differing accuracy
Accuracy is a property of a Geometry
, so all serialisations inherit that accuracy.
If shape 1 is rejected in it current form, it will be required to make the following shapes instead:
- max one outgoing
geo:asWKT
relation for eachgeo:Geometry
- max one outgoing
geo:asGML
relation for eachgeo:Geometry
- max one outgoing
geo:asGeoJSON
relation for eachgeo:Geometry
- max one outgoing
geo:asKML
relation for eachgeo:Geometry
Those shapes could be made, but saying they are required is going a bit far. But next to good documentation and examples, such shapes could help users, yes.
- unsure about
geo:asDGGS
: can different DGGS be in the same SRS and have the same coordinates?
When PR 136 is through, a DGSS Object will no longer be a Geometry, making the question inapplicable. `
some benefits of shape 1:
- The constraint would result in more reliable and simpler query patterns and handling of the query results if you can expect that there's only one serialization
I am not convinced querying would be more reliable or simpler (on the contrary, I think), but data publishers are free to use copies of the same Geometry instance with different IRIs and probably an owl:sameAs relationship if they really want to. But in my mind, the most straightforward way of publishing Geometry data is to provide all the serialisation formats a data publishers wants to support per Geometry instance. It is then up to the consumer to request the preferred format.
- a single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations
It is up to the makers of GeoSPARQL engines to decide how to load geometries into whatever internal format they want to use. But it would make sense to only extract the CRS and coordinates, and to throw an error if there are inconsistencies between multiple serialisations of the same Geometry. Anyhow, what difference would a SHACL shape that is not part of the core specification make?
I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.
If 'equivalence' means different CRSs, I believe that is out of the question, according to the definition of geo:Geometry
.
The definition of Geometry mandates each serialisation to have the same CRS.
Really? That looks like something that should be changed then.
The definition of Geometry mandates each serialisation to have the same CRS.
Really? That looks like something that should be changed then.
Why? The definition in GeoSPARQL is based on GM_Object as defined in ISO-19107. I can´t access that document (paywall), but many replications tell that ¨GM_Object instances are sets of direct positions in a particular coordinate reference system¨. And I think that makes a lot of sense.
OK - I probably misunderstood - if multiple CRS are desired, then they can be provided in separate geo:Geometry
nodes in the context of multiple geo:hasGeometry
properties. As long as there is no restriction on the count of hasGeometry
then we are fine.
I have access to an old version of ISO 19107:2003. There is a class GM_PointArray
which more-or-less corresponds with the actual serialized coordinates.
@FransKnibbe
If shape 1 is rejected in it current form, it will be required to make the following shapes instead:
max one outgoing geo:asWKT relation for each geo:Geometry max one outgoing geo:asGML relation for each geo:Geometry max one outgoing geo:asGeoJSON relation for each geo:Geometry max one outgoing geo:asKML relation for each geo:Geometry
Those shapes could be made, but saying they are required is going a bit far.
If we're asking users to make sure that their geometry serializations are equivalent (same coordinates, same CRS/SRS), what would be the use case that would need to overrule the above described requirement? I can currently only think of a situation when 2 different serializations using the same serialization format have the same coordinates and CRS/SRS but serialized in a different order without changing the represented geometry content (a spatial query would return the exact same result for both). I can't find a direct good use case why people would want that, but maybe you do?
a single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations
It is up to the makers of GeoSPARQL engines to decide how to load geometries into whatever internal format they want to use. But it would make sense to only extract the CRS and coordinates, and to throw an error if there are inconsistencies between multiple serialisations of the same Geometry.
Completely agree
Anyhow, what difference would a SHACL shape that is not part of the core specification make?
Such a shape (with sh:severity
equal to sh:Warning
instead of the default sh:Violation
) can help data consumers, which might be different from the data publishers, to be cautious. Its sh:message
can say something like "It seems like multiple serializations exist for this geometry node. A data publisher should assure that the CRS/SRS and the actual coordinates are equivalent for each serialization of the same geometry, but this might not be the case. A single GeoSPARQL engine might treat different geometry serializations (with the same coordinates and CRS) differently by using distinct geometry loaders and processers internally, resulting in potentially conflicting GeoSPARQL spatial relations delivered by the engine for the multiple serializations. Before using the data, you might want to evaluate if the published data and the operations of the GeoSPARQL engine are correct."
If a data consumer knows it can trust the data publisher and its GeoSPARQL engine with the above, he/she can simply deactivate the shape by asserting sh:deactivated true
on the shape
Many serializations per Geometry:
- The serializations need to be equivalent, but not equal. For example, they could be in different CRS according to literal restrictions (GeoJSON only allows WGS84 for example)
The definition of
Geometry
mandates each serialisation to have the same CRS. So if one serialisation uses GeoJSON, all other serialisations should use CRS84 too (not WGS84, because that uses a different axis order).
- The serializations could have a differing accuracy
Accuracy is a property of a
Geometry
, so all serialisations inherit that accuracy.
Yes, accuracy is a property of Geometry and all serializations should inherit that accuracy unless a serialization that has a restriction on accuracy is defined. (Maybe only allows accuracy to a certain degree) How do we treat this case if it arises? Or is that unrealistic? The other cases can certainly be checked with some SHACL constraint I would think.
I agree we should give this responsibility to the user and set the severity to Warning. However, I do not think that demanding equivalence of the serializations necessarily assumes that a lossless transformation is possible. In my opinion, the user should be able to define what is an equivalent representation and we can only give guidance with SHACL as to the possible correctness of the users decision.
If 'equivalence' means different CRSs, I believe that is out of the question, according to the definition of
geo:Geometry
.
In that case, we cannot allow defining a GeoJSON literal next to any e.g. WKT literals in a coordinate system different from CRS84. If that is how we want things to be we need to explicitly state this. One might even think about using a different property to indicate such a case? (It is probably quite common to have a representation in world coordinates, right?) So we might define something like geo:hasCRS84Geometry ?geom to make this case very clear? Or is that complicating things?
I have access to an old version of ISO 19107:2003. There is a class
GM_PointArray
which more-or-less corresponds with the actual serialized coordinates.
That information could be useful for issue 42.
If we're asking users to make sure that their geometry serializations are equivalent (same coordinates, same CRS/SRS), what would be the use case that would need to overrule the above described requirement? I can currently only think of a situation when 2 different serializations using the same serialization format have the same coordinates and CRS/SRS but serialized in a different order without changing the represented geometry content (a spatial query would return the exact same result for both). I can't find a direct good use case why people would want that, but maybe you do?
Hi @mathib, I think the proposed shapes are fine. I just wanted to say that GeoSPARQL will be good to use without them, so in that sense they are not required. I did not mean there could be exceptions.
Yes, accuracy is a property of Geometry and all serializations should inherit that accuracy unless a serialization that has a restriction on accuracy is defined. (Maybe only allows accuracy to a certain degree) How do we treat this case if it arises? Or is that unrealistic? The other cases can certainly be checked with some SHACL constraint I would think.
In general I think serialisation should follow the model and not the other way around. If a serialisation does not allow all model elements to be expressed, than it is an unfit serialisation.
In that case, we cannot allow defining a GeoJSON literal next to any e.g. WKT literals in a coordinate system different from CRS84. If that is how we want things to be we need to explicitly state this.
It is already stated in the definition of geo:Geometry
, isn't it?
One might even think about using a different property to indicate such a case?
It was suggested here.
(It is probably quite common to have a representation in world coordinates, right?)
Yes, but outside of North America CRS84 is rather useless for serious applications. In Europe we are meant to use ETRS89 for geodetic coordinates (degrees longitude and latitude).
I have my doubts about mandating at least one serialisation of a Geometry. That would mean this kind of data is wrong:
:myUfo
a geo:Feature;
geo:hasGeometry :myGeom1;
rdfs:label "Something spotted in the sky over Africa" .
:myGeom1
a geo:Geometry;
geo:spatialDimension "3"^^xsd:integer;
geo:within :myAfricaGeom .
In other words: it is possible to state facts about a Geometry when its exact coordinates are (temporarily) unknown.
A more general question: Why don't the Geometry shapes just target all instances of geo:Geometry
? It seems easier to define that way, and does not exclude Geometry instances that are not related to a Feature.
I have my doubts about mandating at least one serialisation of a Geometry.
I do too. I am often dealing with Features where we do not know their spatial position, only their position relative to other Features. Mandating at least one serialisation of a Geometry would preclude this, no?
I do too. I am often dealing with Features where we do not know their spatial position, only their position relative to other Features. Mandating at least one serialisation of a Geometry would preclude this, no?
In that case, a solution could be not to state anything about geometry. Features, being SpatialObjects, can have topological relationships with other Features.
In that case, a solution could be not to state anything about geometry. Features, being SpatialObjects, can have topological relationships with other Features.
Of course, thanks! (Retreats back into box...)
Things to do to complete this ticket
This issue is to discuss SHACL validation shape number 1.
I encourage everyone to add pro/con arguments to add to this post so that we can make an informed decision. Please edit this post whenever needed:
1 serialization per Geometry instance:
Many serializations per Geometry:
Please add more arguments for either side that I missed