w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 55 forks source link

Spatial coverage [RSC] #83

Closed jpullmann closed 5 years ago

jpullmann commented 6 years ago

Spatial coverage [RSC]

Provide means to specify spatial coverage with geometries.


Related use cases: Modeling spatial coverage [ID29] 
makxdekkers commented 6 years ago

This is already possible using the Core Location Vocabulary that includes a class Geometry.

makxdekkers commented 6 years ago

Related to #82

stijngoedertier commented 6 years ago

I also like the use of the locn:geometry property for encoding geometries as literals. See also page 57 in GeoDCAT-AP.

nicholascar commented 6 years ago

I don't follow, how can locn:geometry be used? Range for dct:spatial is dct:Location and I see no locn/dct links (subclassing).

Assuming I've missed something and this can work,

locn:geometry seems allows for WKT/GML representation as per GeoSPARQL. From @stijngoedertier 's example:

locn:geometry "POLYGON((-10.58 70.09,34.59 70.09,34.59 34.56,-10.58 34.56,  -10.58 70.09))"^^gsp:wktLiteral ;

However we have more sophisticated feature/geometry handling elsewhere, like in GeoSPARQL itself which has Geometry & Feature classes. Using GeoSPARQL, a site's location in SOSA:

<http://pid.geoscience.gov.au/site/17943> a <http://vocabulary.odm2.org/samplingfeaturetype/borehole>,
    geosp:hasGeometry [ a geosp:Geometry ;
            geosp:asWKT "SRID=GDA94;POINT(137.8563726 -33.7108293)"^^geosp:wktLiteral ] .

I can't see any normative links from locn to GeoSPARQL though, only suggestions for use. Perhaps Andrea, a locn editor, can indicate the intention of locn? Otherwise, I'd be more inclined to allow the use of GeoSPARQL directly.

stijngoedertier commented 6 years ago

locn:geometry would be a property of a dcterms:Location, as in the examples: GeoDCAT-AP page 57 or the SDW-BP example 15. Something like:

[] a dcat:Dataset;
   dcterms:spatial [
      a dcterms:Location;
      locn:geometry "POINT(4.7173 50.8643)"^^geosparql:wktLiteral ] .

As @nicholascar suggests, this can indeed also be done directly using geosparql:hasGeometry. GeoSparql provides a normative way of representing geometries expressed as WKT or GML, although it leaves other serializations such as GeoJSON to future work.

[] a dcat:Dataset;
   dcterms:spatial [
      a dcterms:Location;
      geosparql:hasGeometry [ 
            a geosparql:Geometry ;
            geosparql:asWKT "POINT(137.8563726 -33.7108293)"^^geosparql:wktLiteral ]
] .
rob-metalinkage commented 6 years ago

is "coverage" the same as "the geometry"?

The definition of locn:geometry is "Associates any resource with the corresponding geometry."

DCAT may have a concept of "the" - i.e. there is a requirement that the geometry is a specific type with a specific characteristics...

There is a difference between a geometry defining and indexable extent (normal use of coverage) and for example a 3D model of a city - and certainly most geographical features having area extents. To put this in perspective, some time ago i was looking at modelling the coastline - and there are something like 26 different definitions in Australian law - so there are in fact multiple possible geometries depending on the application domain , but most significantly, if you take a large entity - such as the state of Western Australia (or even one of the indigenous land claims in the state) you get a geometry of many MB from the authoritative dataset, but any most applications would need a generalised form.

so, if you re-use a term with a broader semantics for a specific function, at some stage this needs to be declared. The qualified link to the geometry object gives you this AFAICT. You may need to specify that the dcterms:Location has the same meaning as you intend for "coverage" - or perhaps create a subclass of it with this explicit semantics. If the OWL model entails this when the subject of a dct:spatial relationship is a dcat:Dataset, then backwards compatibility is maintained with the use of dcterms:Location

nicholascar commented 6 years ago

From the locn documentation (https://www.w3.org/ns/locn) for the Property geometry:

:Resource locn:geometry [ 
  a sf:Point; 
  gsp:asWKT "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> Point(-0.001475 51.477811)"^^gsp:wktLiteral 
] .

So, given the domain constrains of gsp:asWKT, the blank node is a gsp:Geometry as well as an sf:Point (and an sf:Geometry). The range value for locn:geometry is a locn:Geometry so, a locn:Geometry == gsp:Geometry. But there is no domain value for locn:geometry so we can't say that in this example the :Resource is equivalent to a gsp:Feature, which is what the use of gsp:geometry would make it.

andrea-perego commented 6 years ago

Sorry to jump in here so late.

I should probably provide a bit of context for explaining the reasons behind the design choices of locn:geometry and its use for specifying the spatial coverage of a dataset.

So, the reason why the range of locn:geometry was left open is twofold:

Coming now to the use of locn:geometry in GeoDCAT-AP (which is more relevant here), the issue was to find out how to represent the spatial coverage from ISO 19115 records, which, in the majority of the cases, is done with a bbox.

DCAT didn't provide any guidance on this. Moreover, the GeoDCAT-AP WG re-stated the requirement of being able to directly specify the bbox as a literal. So, the choice of locn:geometry was pretty straightforward, and all discussion focussed instead on which should be the recommended geometry encoding (the most voted options being GML, WKT and GeoJSON).

The use of locn:geometry along these lines has been also documented in the W3C/OGC Spatial Data on the Web BPs - see, e.g., Example 15.

Based on the implementation experiences I'm aware of, the use of locn:geometry in this way has not raised any specific problem. Rather, the issue (that we had also when developing GeoDCAT-AP) is that there is no established practice on how to specify bboxes in RDF (and even centroids). However, as far as I can understand, working on this is not in scope of the DXWG.

BTW, this was also one of the issues discussed at length in the LOCADD CG (there's a page summarising the discussion), and then inherited by the SDWWG, but no solution has been provided.

andrea-perego commented 6 years ago

@rob-metalinkage said:

is "coverage" the same as "the geometry"?

Well, as locn:geometry is used in GeoDCAT-AP, the answer could be no.

In GeoDCAT-AP (as in DCAT), the spatial coverage is specified by using dct:spatial, which points to a dct:Location (i.e., to a "spatial thing", in SDWWG terms). This spatial thing can be specified in different ways - e.g., as a geoname, as in the examples provided in the DCAT spec, or by specifying its extent with a geometry (the two options are of course not mutually exclusive).

andrea-perego commented 6 years ago

@nicholascar said:

So, given the domain constrains of gsp:asWKT, the blank node is a gsp:Geometry as well as an sf:Point (and an sf:Geometry). The range value for locn:geometry is a locn:Geometry so, a locn:Geometry == gsp:Geometry. But there is no domain value for locn:geometry so we can't say that in this example the :Resource is equivalent to a gsp:Feature, which is what the use of gsp:geometry would make it.

@nicholascar , I don't know if I have (at least, partially) answered to your question in my comment above. However, more details on this specific issue are provided in the LOCADD wiki page I mentioned:

https://www.w3.org/community/locadd/wiki/Use_case:_Sub-properties_for_locn:geometry

dr-shorthair commented 5 years ago

For geospatial data, spatial coverage should also be complemented by spatial resolution - see discussion at #84 (comment)

dr-shorthair commented 5 years ago

See proposal for dcat:spatialResolution in branch https://github.com/w3c/dxwg/tree/dcat-issue84-sres-simon -

dr-shorthair commented 5 years ago

Back to spatial coverage: unfortunately there are at least three respectable serializations of geometry in common use in different parts of the community:

AFAIK WKT is the only one that also supports association with a Coordinate Reference System

Then there are others, like GML, What3Words, ...

The Spatial Data on the Web Best Practice is agnostic ... Should we attempt to provide any guidance?

akuckartz commented 5 years ago

Compatibility between GeoJSON and JSON-LD 1.0 is problematic. I am not sure about JSON-LD 1.1 (see the discussion in https://github.com/w3c/json-ld-syntax/issues/7).

smrgeoinfo commented 5 years ago

providing some guidance would be a boon to interoperability. Being able to specify different SRS for the geolocation is nice for generality, but for interoperability, settling for something simple like an EPSG-4326 bounding box is much better. Using Web UTM is popular, but won't work in polar regions.

nicholascar commented 5 years ago

A plug for WKT here: it's much easier for the dataset processing my client agencies do to record WKT strings as RDF literals and then to pull those down and calculate spatial things with them as needed, given that Postgres etc can instantly use them.

It woul dbe interesting to know what spatial representations things spatially-enabled triplestores like. Perhaps @pwin can comment since he uses such things.

dr-shorthair commented 5 years ago

Thanks for that link @akuckartz

IMO the key problem with the GeoJSON vs JSON-LD issue is that it is based on an incorrect assumption. At its core, RDF is for semantics, while JSON is for data transport. JSON-LD is OK to transport RDF. GeoJSON is OK to transport geometry. But the details of representation of geometry are not about the semantics, they are about mathematics*. And RDF is not tuned for mathematics. So it is no surprise that GeoJSON cannot be shoehorned into JSON-LD.

GeoSPARQL deals with the boundary between semantics and mathematics more honestly, by switching to a micro-format (WKT) when the boundary is crossed. The semantics of the information is managed on the RDF side of the boundary ('it is a geometry!") while the mathematical representation of the geometry is managed on the WKT side of the boundary ("it is a nested, ordered set of numbers").

* A point in space is a unitary concept, but our mathematical systems require several numbers to represent it. The numbers are not independent since they change together if the CRS changes.

andrea-perego commented 5 years ago

I'm all for WKT, but I would like to re-state a couple of issues mentioned earlier in this thread (see https://github.com/w3c/dxwg/issues/83#issuecomment-371650468):

  1. The level of complexity of the proposed solution(s): As I mentioned earlier, the requirement in LOCN (and then in GeoDCAT-AP) was to reduce as much as possible the number of nodes in the graph between the dataset and the geometry specifying its spatial coverage. IMO, this is a common concern, and it is probably not surprising that some triple stores (as Virtuoso) have a buit-in property geo:geometry (which is not included in the W3C Basic Geo vocabulary), whose range is a WKT literal.

  2. The need to have specific properties to represent centroids and bboxes: This is typically how spatial coverage is specified when it is a geometry, but we don't have properties for doing the job. This can be addressed if using GML (which has the notion of "Envelope"), but not in WKT. But this means that this information will be at the level of literal only.

In my experience, these are the main issues people are stumbling upon.

andrea-perego commented 5 years ago

This issue was discussed by the DCAT subgroup on 6 Mar 2019, and I got an action to make a proposal.

I think it is probably better to split the proposal in two, and I would first go for the issue concerning the fact that we lack specific properties for specifying bboxes and centroids.

Proposal 1

So, my proposal nr. 1 is to define two corresponding properties in the DCAT namespace:

The range of these two properties may depend on the decision taken about proposal nr. 2 below, so for the moment I leave it undefined.

Proposal 2

The proposal nr. 2 is about how to specify the geometry itself. In this case, I would ask for a vote on 3 possible proposals:

Proposals 2.a and 2.b address the issue I mentioned earlier in this discussion of reducing the "distance" in the graph between the dataset and the geometry itself. So, it gives a more flat specification of the spatial coverage compared with GeoSPARQL.

Putting this into examples (re-using @nicholascar 's ones):

a:Dataset dct:spatial a:Location .

a:Location a dct:Location ;
  locn:geometry "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> Point(-0.001475 51.477811)"^^gsp:wktLiteral .
a:Dataset dct:spatial a:Location .

a:Location a dct:Location ;
  dcat:geometry "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> Point(-0.001475 51.477811)"^^gsp:wktLiteral .
a:Dataset dct:spatial a:Location .

a:Location a dct:Location ;
  gsp:hasGeometry [ 
    a sf:Point; 
    gsp:asWKT "<http://www.opengis.net/def/crs/OGC/1.3/CRS84> Point(-0.001475 51.477811)"^^gsp:wktLiteral 
] .

Looking forward to your votes & feedback.

akuckartz commented 5 years ago

Example 2.c seems to have some redundancy: sf:Point and Point(...). Is that necessary?

andrea-perego commented 5 years ago

@akuckartz said:

Example 2.c seems to have some redundancy: sf:Point and Point(...). Is that necessary?

This is how the geometry will be represented and encoded in GeoSPARQL. The type of geometry is specified both at the level of class, and in the WKT (or GML) literal.

andrea-perego commented 5 years ago

I implemented proposals 1 & 2.a for you to review via the following PR https://github.com/w3c/dxwg/pull/807

Preview here:

https://raw.githack.com/w3c/dxwg/andrea-perego-dcat-rev-temporal-spatial-coverage/dcat/index.html#Class:Location

akuckartz commented 5 years ago

LGTM !