w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
145 stars 46 forks source link

Add support for spatial resolution specification via skos:ConceptScheme #1263

Open jakubklimek opened 3 years ago

jakubklimek commented 3 years ago

What is missing for me, is the ability to specify spatial resolution using a concept scheme instead of a number of meters. E.g. I cannot specify that the resolution (in the case of addresses) is to the level of districts, towns, regions, states, etc.

It might be that the term "spatial resolution" does not fit here exactly, in the light of the current spatial resolution coming from the GIS community.

Example: Demography statistics where the resolution is "regions" or "towns" or "states". These might correspond e.g. to NUTS levels and LAU levels in the EU. While the specific code-list is probably out of scope of DCAT, the support for this might be in scope.

Originally posted by @jakubklimek in https://github.com/w3c/dxwg/issues/806#issuecomment-716322176

andrea-perego commented 3 years ago

@jakubklimek , do you have concrete examples and/or proposals on how you would address this issue? E.g., how would you use, in practice, the NUTS / LAU levels to specify this information?

jakubklimek commented 3 years ago

My proposal is to add a new DCAT property (e.g. dcat:spatialResolution) with the domain of dcat:Dataset and the range of skos:Concept, which would support connecting to a concept representing a spatial hierarchy level such as NUTS2, NUTS3, LAU (formerly also LAU1 and LAU2), etc.

I am currently not aware of any existing code list, which could be universally used for this (not even, for instance, in EU), since even the now discontinued LAU2 level corresponds to the level of municipalities, but we could go further to the level of parts of towns, streets, etc.). There is, however, a local code list for this in Czechia, which could be used, and I think there could be similar code lists elsewhere.

A specific example could be (with a fictitious NUTS3 level IRI):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .

<https://data.gov.cz/zdroj/datové-sady/http---vdb.czso.cz-pll-eweb-package_show-id-290038r05> a dcat:Dataset;
    dct:title "Cizinci podle státního občanství, věku a pohlaví v okresech - rok 2004"@cs,
              "Foreigners: by citizenship, age and sex in districts - year 2004"@en ;
    dct:accrualPeriodicity freq:ANNUAL ;
    dct:publisher <https://rpp-opendata.egon.gov.cz/odrpp/zdroj/orgán-veřejné-moci/00025593> ;
    dct:spatial <https://linked.cuzk.cz/resource/ruian/stat/1> ;
    dcat:spatialResolution <http://data.europa.eu/nuts/scheme/2016/level/3> .

indicating, that the dataset contains records for NUTS3 regions.

Note that currently, there are no actual IRIs for NUTS/LAU levels, only for NUTS classifications themselves as the levels of individual codes are specified as literals.

Robsteranium commented 3 years ago

We'd also like to see this added.

Specifying resolution in metres only really makes sense for data collected on a continuous distance scale (like sampling points), not nominal coding (like we have with administrative geography).

We could express the resolution in metres using e.g the average radius of geometric polygons but the variance would be far too high (e.g. country-level resolution would include distances on the scale of both Luxembourg and Russia).

Instead it's more meaningful to talk about the units into which the data is aggregated to calculate the statistics.

Coincidentally we came up with the same idea independently!

:spatialResolution a owl:DatatypeProperty, rdf:Property ;
  rdfs:subPropertyOf dcterms:spatial ;
  rdfs:domain dcat:Dataset ;
  rdfs:label "Spatial Resolution" ;
  rdfs:comment "The granularity of geographic areas covered by the data."@en ;
  skos:scopeNote "The range has a nominal type which is more appropriate for statistical data collection than the continuous distance scale provided by dcat:SpatialResolutionInMetres."@en ;
  rdfs:range :SpatialResolution ;
  rdfs:seeAlso dcat:spatialResolutionInMeters ;
  .

:SpatialResolution a rdfs:Class;
  rdfs:label "Spatial Resolution";
  rdfs:comment "The level of spatial resolution."@en ;
  .

The key difference from the above proposal is that we aren't using skos:Concept for the range and instead introduce a :SpatialResolution class.

This would be more like dct:Location (the range of dct:spatial) which can be described spatially with a dcat:bbox or nominally with a locn:geographicName.

We didn't use skos:Concept because our geographies themselves are already skos:Concepts. You'd need something to distinguish geography-concepts from geography-level-concepts and a property to attach the latter to the former.

For unrelated reasons we've found it necessary to identify a :ConceptLevel class and I think it suits this purpose reasonably well too.

Why Concept Level? The `:ConceptLevel` arose in the context of [English administrative geography](https://www.ons.gov.uk/methodology/geography/ukgeographies/administrativegeography/england) where there's not a single ordered set of levels which meant that we couldn't use `xkos:levels`/ `xkos:ClassificationLevel`.

We have:

:Country rdfs:subClassOf :ConceptLevel .
:NUTS1 rdfs:subClassOf :ConceptLevel .

<http://data.europa.eu/nuts/code/UK> a skos:Concept, :Country;
  skos:narrower <http://data.europa.eu/nuts/code/UKD> .

<http://data.europa.eu/nuts/code/UKD> a skos:Concept, :NUTS1;
  skos:broader <http://data.europa.eu/nuts/code/UK> .

Thus we could have:

:national_output_dataset :spatialResolution :Country .
:regional_output_dataset :spatialResolution :NUTS1 .

# and so by inference...
:Country a :SpatialResolution .
:NUTS1 a :SpatialResolution .
andrea-perego commented 3 years ago

@jakubklimek , @Robsteranium , thanks for contributing your solutions.

A couple of comments:

  1. DCAT suggests using the Data Quality Vocabulary for spatial resolution types other than distance in metres - quoting from §6.6.5 Property: spatial resolution in DCAT 3 ED:

    The range of this property is a decimal number representing a length in meters. This is intended to provide a summary indication of the spatial resolution of the data as a single number. More complex descriptions of various aspects of spatial precision, accuracy, resolution and other statistics can be provided using the Data Quality Vocabulary [VOCAB-DQV].

    GeoDCAT-AP gives an example of how to do this, but still on spatial resolution as distance or equivalent scale - see §B.6.13 Spatial resolution – Spatial resolution of the dataset in GeoDCAT-AP 2.

    Do you think that a similar approach can be applied also to your use cases?

  2. About "concept levels", XKOS defines a specific class for this notion (namely, xkos:ClassificationLevel) as a suclass of skos:Collection - see https://rdf-vocabulary.ddialliance.org/xkos.html#levels . Would the use of skos:Collection fit with your approach?

Robsteranium commented 3 years ago

Thanks @andrea-perego.

  1. Sadly I don't think GeoDCAT/ DQV fit this use case. The alternatives they provide are all still continuous measures of distance whereas we're looking for a discrete nominal scale.
  2. I guess concept levels could be expressed with an skos:Collection orthogonal to the skos:ConceptScheme of locations but we'd still need a way to distinguish these roles (it would've been nice to use xkos:ClassificationLevel but for the aforementioned problems with xkos:depth/xkos:levels etc). A collection also wouldn't serve as the rdfs:range for the :spatialResolution property we're proposing here - we ought to specify the level of detail in terms of a type of location like "Country" or "City" not an enumeration of the countries or cities included in the dataset.