dct:spatial on dcat:Dataset: a rework of the definiton is needed to align it with DCAT-AP

sabinem commented 3 years ago

Property:dct:spatial Class:dcat:Dataset Conformance Problem

in DCAT-AP locations names must be provided as URIs and if possible controlled vocabularies should be used. the usage note does not make that clear
the remark on polygons is outdated

Proposal:

improve usage note and demand the use of controlled vocabularies
employ controlled vocabularies about Swiss locations
add example for points, bounding boxes and polygons

metaodi commented 2 years ago

Yes, either new vocabularies are needed (see "Kreis", "Berzirk", "Gemeinde" in DCAT-AP.de) or at least something like geonames should be used.

If I remember correctly this vocabulary has already been discussed as part of the study of Beat and Stephan.

andreasamsler commented 2 years ago

Asked @mmznrSTAT for his feedbak on this.

mmznrSTAT commented 2 years ago

Geonames would be great: see for example LINDAS Query Admin unit at given point+%0A++FILTER+(bif%3Ast_contains+(%3FCoords%2C+bif%3Ast_point+(7.43%2C+46.95)))%0A++FILTER+(%3FDate+%3D+%222016-01-01%22%5E%5Exsd%3Adate)%0A%7D&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=https%3A%2F%2Fld.geo.admin.ch%2Fquery&requestMethod=POST&tabTitle=Query+1&headers=%7B%7D&outputFormat=table)

p1d1d1 commented 2 years ago

@mmznrSTAT that query is from the Linked Data Service of the BGDI :-) and not really related to the issue we have here. GeoNames resources are something like https://sws.geonames.org/2657895/ (e.g. for ZH).

p1d1d1 commented 2 years ago

@metaodi

Yes, either new vocabularies are needed (see "Kreis", "Berzirk", "Gemeinde" in DCAT-AP.de) or at least something like geonames should be used.

GeoNames has Kantone, Bezirke, Gemeinden, Kreise

p1d1d1 commented 2 years ago

For this issue I'd stick to what is defined in DCAT-AP 2.0.1 page 22. Additionally would be nice to provide an example with coordinates, as stated in the proposal

Juan-Juan-1 commented 2 years ago

Vorschlag:

improve usage note and demand the use of controlled vocabularies EU proposed on page 22
add example for points, bounding boxes and polygons

Juan-Juan-1 commented 2 years ago

Vorschlag:

improve usage note and demand the use of controlled vocabularies EU proposed on page 22
add example for points, bounding boxes and polygons

mmznrSTAT commented 2 years ago

https://github.com/opendata-swiss/dcat_ap_ch/issues/61#issuecomment-953802488

Was a federated query with reference to geonames, no? But not an issue, @p1d1d1 ;)

mmznrSTAT commented 2 years ago

a Question from my side (sorry, i couldn't take part in the meeting!): if i publish values for every municipality (for example population density) in a canton, do i write a list of all the municipalities? so is it 1:n? would be better than give the bounding box or the geoname of the canton for example.

metaodi commented 2 years ago

According to https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#dataset-spatial-coverage the cardinality of dct:spatial is 0:n

p1d1d1 commented 2 years ago

@mmznrSTAT I wouldn't do that! IMHO the spatial extent is a property of the entire dataset, independently of the data granularity.

mmznrSTAT commented 2 years ago

I see the geodata perspecitve on that. Granularity and spatial coverage have different meanings and usage in geodata and in statistical data. I think this is why we never agreed on this topic. What is the argument against it?

Juan-Juan-1 commented 2 years ago

Just to make it clear:

dct:spatial on Catalogue is Recommended and Cardinality is 0...n
dct:spatial on Dataset is Recommended and Cardinality is 0...n
On Distribution there is no dct:spatial (only "dcat:spatialResolutionInMeters", but that's something else. This is by the way why we introduced dct:coverage

@mmznrSTAT you asked what it is "better": May I ask what is your better?

The least redundancy possible? Or accuracy?
Or to be sure that if I look for data on Aeugst am Albis I find only dataset with effectively data on this gemeinde, without having to consult more datasets with value "canton ZH"? That is to have more "precise" recise search?

I'd be by the way definitely interested to know why Geo and Stat never agreed on Granularity and spatial coverage :)

mmznrSTAT commented 2 years ago

imho they never agreed because spatial on a map means something different than spatial in a table. A map covers an area no matter what the scale is used. Scale is on a different level. In a table, if i have the spatial information canton of zurich, i don't have values for everything in the canton. i have an aggregation. So there is NO information in this table on parts of zurich. But maybe i was just always wrong? Does somebody have examples of usage of spatial and spatial coverage?

Juan-Juan-1 commented 2 years ago

Norway: https://data.norge.no/specification/dcat-ap-no/#Datasett-dekningsomr%C3%A5de "Reference, primarily in the form of a URI for an administrative area, or name of place or area taken from a controlled vocabulary (for example Central place name register), or geographical coordinates (EU89) for the area to which the dataset applies (point or geographical boundary frame cf. ISO 19115)."

Sweden (they have something interesting, 2 spatial, once with names, once with geo): Named geographical area (https://docs.dataportal.se/dcat/en/#dcat_Dataset-dcterms_spatial) Geographical area (https://docs.dataportal.se/dcat/en/#dcat_Dataset-dcterms_spatial-2)

mmznrSTAT commented 2 years ago

and norge uses the spatial information prominently: https://data.norge.no/datasets it references to https://data.geonorge.no/, the norge geonames. this usage limits the cardinality. And as seen here: https://data.norge.no/datasets/15d63821-210d-4cdb-be62-927b3c7f1cb6 they use it in the way @p1d1d1 is arguing for (spatial is Norway).

p1d1d1 commented 2 years ago

Actually I don't understand why "spatial" in a map means something different than "spatial" in a table. The definition for dct:spatial is "The geographical area covered by the dataset". Now imagine to have 2 tables (2 datasets), one with all municipalities of Kt. ZU providing population info for each municipality and one with only one record providing population information for the entire Kt. ZU (so the aggregations): these tables (datasets) have both (IMHO) dct:spatial = Kt. ZU. The data "covers" the entire Kt. ZU in both cases and regardless of the data granularity. Again, dct:spatial has nothing to do with data granularity, while it is about the area covered by the dataset as a whole.

@mmznrSTAT could argue that in the case of the first table a user could interpret the information dct:spatial = Kt. ZU as if the data pertains "only" to the Kt. ZU. Now, if that is the fear, one could provide a list of dct:spatial, one for each municipality. I personally wouldn't do that, since this is not IMHO the semantics of dct:spatial.

mmznrSTAT commented 2 years ago

yes! we talk the same language, @p1d1d1 and i see your point with the semantics. And i agree. My input ist mainly for machine to machine communication. if we use it only as 0:1, we loose information.

"The geographical area covered by the dataset": we don't have geographical areas in statistical datasets, we have area reference (Raumbezug) which is not the same. The area in statistical datasets implicitly has more information then just a polygon of area covered (not to say coverage ;) ) or a bounding box. for example administration level etc. We can use spatial as "geographical area covered" only, but then we loose the opportunity to adress the needs of the statistical community.

ruizcrp commented 2 years ago

Hi, maybe I can contribute with a concrete use-case that we have in project STATBOT.swiss.

We basically need per data-point in a dataset a very clear information about its spatial attribution. dct:spatial on dcat:Dataset (not dcat:Catalogue as mentioned). Every data-point can have information about different spatial levels. A row can be about the population in Aeugst am Albis, another row can be about the population on the respective Zurich Bezirk and another one could be on cantonal level.

There can also be non-additive variables such as electoral results: Aeugst am Albis voted 45% yes on a vote, but the canton 32% for example.

In order to do that, the approach of vocabularies, as mentioned above, is important. Kreis, Wohnviertel, Bezirke and so on have to be defined. Or to have them as geonames if they point to unique spatial units.

All in all: If it gives us a possibility of unique identifier for a spatial area, this would help us a lot.

We are currently trying to build something that is not standardized at all (because of improvisation) and would clearly adapt such a new DCAT standard when it gets available.

opendata-swiss / dcat_ap_ch

dct:spatial on dcat:Dataset: a rework of the definiton is needed to align it with DCAT-AP #61