opengeospatial / ogc-geosparql

Public Repository for the OGC GeoSPARQL Standards Working Group
77 stars 20 forks source link

What is hasSpatialResolution range #98

Open nicholascar opened 3 years ago

nicholascar commented 3 years ago

We have a new property in GeoSPARQL 1.1: hasSpatialResolution inspired by GeoX.

We initially decided to cut GeoSPARQL off at hasSpatialResolution and not specify a range for it as GeoX does:

geox:hasSpatialResolution rdfs:range data:QuantitativeMeasure

where data:QuantitativeMeasure is imported into GeoX from a generic data ontology.

Is this defensible? Can we expect sensible use of this property without a range value, given that GeoX's expected use isn't for a simple literal or an IRI but for a compound/complex object with properties such as resolution, unit & value etc.

Proposals (to be added to/argued against in comments):

dr-shorthair commented 3 years ago

Actually I'm coming round to the idea of using structured literals for scaled quantities, along the lines proposed in the LINDT project by Lefrancois and Zimmerman, 2018. In place of the data:QuantitativeMeasure class this would have the cdt:ucum Datatype, which just specifies a structured literal composed of a number, followed by a one or more spaces, then a UCUM code denoting the units of measure. So we could see

ex:G987 a geo:Geometry ; 
    geo:hasSpatialResolution "30 m"^^cdt:ucum ;
. 

or even

ex:G654 a geo:Geometry ; 
    geo:hasSpatialResolution "100 [rlk_us]"^^cdt:ucum ;
. 

(that is 100 links-for Ramden's-Chain, which is a real survey length unit used in US).

cdt: corresponds to http://w3id.org/lindt/custom_datatypes#. This is a highly scalable approach, which allows for quantities to be expressed using any valid UCUM code, and is pretty easy to parse to get the numeric part out. e.g.

SELECT *
WHERE {
    ?g a geo:Geometry ; geo:hasSpatialResolution ?r .
    FILTER( DATATYPE(?r) = cdt:ucum )
    BIND ( REPLACE ( STR(?r) , "\\s.+", "" ) as ?n )
}
nicholascar commented 3 years ago

@dr-shorthair what is the latest in SPARQL 1.2 regarding UoM? Is this suggested approach recommended there?

dr-shorthair commented 3 years ago

Discussion is currently stalled. https://github.com/w3c/sparql-12/issues/129

FransKnibbe commented 3 years ago

It would be great if it is possible to use a single number that can be used for spatial resolution. That would make filtering and combining spatial data from different sources a lot easier. I recall having come across a means of expressing spatial resolution as a number that was applicable to both vector data and raster data. It was a long time ago and now I can't find it. Does someone happen to know more? I also recall that the people working on GeoDCAT-AP had trouble coming up with good semantics for spatial resolution in metadata... As for the unit of any numerical expression of spatial resolution: wouldn't it stand to reason to use the same unit as the coordinates, i.e. the unit defined by the CRS?

nicholascar commented 3 years ago

We are going to need a solution for this issue that is consistent with the pattern of use suggested for SpatialMeasure which is the range object for the new hasLength, hasArea and hasVolume properties.

So, using the microformat literal and extending the example above, we would need something like:

ex:G654 
    a geo:Geometry ; 
    geo:hasSpatialResolution "5 m"^^cdt:ucum ;
    geo:hasArea "420 m"^^cdt:ucum ;
    geo:inCRS <http://www.opengis.net/def/crs/EPSG/0/4283> ;   # GDA94
.
mathib commented 3 years ago

As for the unit of any numerical expression of spatial resolution: wouldn't it stand to reason to use the same unit as the coordinates, i.e. the unit defined by the CRS?

I would be against this. In the proposal of Simon using CDT/UCUM, the spatial resolution and other properties can be treated separatly from the CRS. As a result, you have maximum portability of properties (you don't need to know the CRS) and you're not limited to the units of the CRS. In addition, when CDT/UCUM is supported by the query engine, you can easily compare values of different units (see this nice playground with a Jena backend implementation)

dr-shorthair commented 3 years ago

The matter of the coupling of units and coordinate reference system definitions has a long and not always happy history. While there is no question that geodesy officialdom believes the issue is resolved, and the only place for units of measurement is inside the CRS definitions, they have failed to communicate this pervasively to the rest of the world, particularly the general user and web developer. Yes they are correct - according to the definitions they coined. But shouting this at everyone who asks does not actually win the argument.

Which is a long way round to saying I agree that it is more useful to indicate units nearby the number, than to expect people to hunt down a canonical definition, which might be in an Access database file from EPSG (and who knows how to find and search through one of those!??)

FransKnibbe commented 3 years ago

As for the unit of any numerical expression of spatial resolution: wouldn't it stand to reason to use the same unit as the coordinates, i.e. the unit defined by the CRS?

I would be against this. In the proposal of Simon using CDT/UCUM, the spatial resolution and other properties can be treated separatly from the CRS. As a result, you have maximum portability of properties (you don't need to know the CRS) and you're not limited to the units of the CRS. In addition, when CDT/UCUM is supported by the query engine, you can easily compare values of different units (see this nice playground with a Jena backend implementation)

I agree that the cdt:ucum datatype looks great. It is simple, and it is demonstrated to work in SPARQL queries. But leaving out the unit would be even simpler and is automatically supported by SPARQL engines and other means of data processing (locally on a web page for example). And would it really hurt to define Geometry in such a way that all spatial properties use the same reference system? The more I think about it, the more sensible it seems. It would working with spatial data simpler en less error-prone. Yes, portability of properties is a valid consideration. A value with a unit that is dereferenceable is more self-contained. But would such breaking up of data happen in practice? It would be little effort to have the SRS URI on board too, as an essential bit of metadata to most Geometry properties.

FransKnibbe commented 3 years ago

Which is a long way round to saying I agree that it is more useful to indicate units nearby the number, than to expect people to hunt down a canonical definition, which is might be in an Access database file from EPSG (and who knows how to find and search through one of those!??)

I agree that any method involving the EPSG Access database should be avoided! But I am trying to look at the future where only directly dereferenceable CRS/SRS URIs are used. Those URI's could give direct access (following the nose, no SPARQL required) to the specification of the unit, for which systems like QUDT and/or UCUM could be used. Here we have arrived at issue12.

FransKnibbe commented 3 years ago

On the topic of spatial resolution it would be good to read the recommendations in the SDWBP. It includes a reference to the Data Quality Vocabulary, which has a section on precision and accuracy.

Another inspiration is dcat:spatialResolutionInMeters in GeoDCAT-AP. Fixing spatial resolution to one distance unit has impressive benefits, in my opinion. Next to that, it is good to be able to align instance data and dataset metadata (see also issue 104).

andrea-perego commented 3 years ago

@FransKnibbe said:

On the topic of spatial resolution it would be good to read the recommendations in the SDWBP. It includes a reference to the Data Quality Vocabulary, which has a section on accuracy.

Another inspiration is dcat:spatialResolutionInMeters in GeoDCAT-AP. Fixing spatial resolution to one distance unit has impressive benefits, in my opinion. Next to that, it is good to be able to align instance data and dataset metadata (see also issue 104).

Please note that an updated version of the SDWBP, aligned with the latest versions of DCAT and GeoDCAT-AP, is now available (see PR https://github.com/w3c/sdw/pull/1247 for the details):

https://w3c.github.io/sdw/bp/

In particular:

FransKnibbe commented 3 years ago

(this comment was replaced because the first version did not make sense) @andrea-perego: it seems that GeoDCAT-AP does not include a property to communicate the spatial accuracy of a dataset. Has it been considered?

As for spatial resolution, what exactly does it mean in GeoDCAT-AP? Is there one definition that can be applied to both geometric and cell-based spatial data?

andrea-perego commented 3 years ago

@FransKnibbe said:

@andrea-perego: it seems that GeoDCAT-AP does not include a property to communicate the spatial accuracy of a dataset. Has it been considered?

As for spatial resolution, what exactly does it mean in GeoDCAT-AP? Is there one definition that can be applied to both geometric and cell-based spatial data?

The scope of GeoDCAT-AP corresponds to the union of the metadata elements of the core profile of ISO 19115:2003 and INSPIRE, which do not include accuracy (more details at https://semiceu.github.io/GeoDCAT-AP/releases/2.0.0/#overview-of-metadata-elements-covered-by-geodcat-ap).

About (the different types and sub-types of) accuracy in ISO 19115, I also don't have the standard at hand, but the OGC Testbed-12 document may provide a useful summary:

http://docs.opengeospatial.org/per/16-050.html#_quality_classes_and_structure_of_the_quality_measures

Concerning spatial resolution in GeoDCAT-AP, it is exactly the same notion as in ISO 19115 and INSPIRE.

Quoting from OGC Testbed-12 (http://docs.opengeospatial.org/per/16-050.html#_others_metadata_elements_indirectly_related_with_quality):

  • spatial resolution: this information is commonly confused with the spatial accuracy. The spatial resolution is related to the pixel size chosen to encode the data in a raster/coverage format while the spatial accuracy refers to the deviance in the geographic position of the pixel from its real ground position. Many times both are related but are not the same. The encoding of this one escapes data quality and ISO 19115 explains how to do it.

About how this notion is used in INSPIRE:

https://inspire.ec.europa.eu/glossary/MetadataElement-SpatialResolution

Spatial resolution refers to the level of detail of the data set. It shall be expressed as a set of zero to many resolution distances (typically for gridded data and imagery-derived products) or equivalent scales (typically for maps or map-derived products). An equivalent scale is generally expressed as an integer value expressing the scale denominator. A resolution distance shall be expressed as a numerical value associated with a unit of length.

FransKnibbe commented 3 years ago

Thank you for the valuable information, @andrea-perego. The CE90 described in the The OGC testbed-12 document could be a useful indicator for accuracy. It does seem applicable to vector data too. A good thing to keep in mind for issue 23.

As for spatial resolution, the quotes from INSPIRE and testbed 12 focus on raster data only (which in the case of testbed 12 stands to reason, because it is about imagery data). For GeoSPARQL I think we need a definition that is applicable to both raster and vector data (and perhaps for GeoDCAT-AP too?). It seems to me that the tolerance distance as used in geometric generalisation in GIS is very similar to pixel size in raster data. But can it be regarded as the same?

oldskeptic commented 9 months ago

Just nudging this along with a note about end application use (#257 and #430 in mind).

A feature with multiple geometries:

ex:G987 a geo:Geometry ; 
    geo:hasSpatialResolution "30"^^https://qudt.org/vocab/unit/M ;
ex:G988 a geo:Geometry ; 
    geo:hasSpatialResolution "30000"^^https://qudt.org/vocab/unit/M ;
. 

would allow an application to select the geometry that is the most appropriate according to the viewport and its native screen resolution. The use of a units ontology allows for the mechanization of that selection, whether in pixels, chains, meters or whatever.

dr-shorthair commented 9 months ago

@oldskeptic I like your thinking. Just use the QUDT unit URI as the datatype. Implies all instances of qudt:unit are subclasses of number, but I can accept that.

The downside is OWL's comprehension of custom datatypes is pretty shaky

and implementations are weak to non-existent. The Protege folk are likely to blow a fuse.

oldskeptic commented 9 months ago

In the specific case of comment 1872448826, I was going to implement in "query space" for the moment rather than depending on the Abox; I don't think we are there yet in OWL2 world.

I agree that Protege folks would think this is problematic, through the actual issue is inside OWL API and/or whatever reasoner is in use, but I have seen something similar as accepted practice in ISWC papers since 2006 or so.

Geosparql caters to several "tribes". I don't like breaking things for other people, but my thinking is that QUDT unit URI as the datatype is something that has too many benefits for data consumers to be ignored.