opendata-swiss / dcat_ap_ch

Examples for geocat and DCAT data-catalogs are given here
5 stars 3 forks source link

Add possibility to describe attributes of a dataset/distribution #183

Open metaodi opened 2 years ago

metaodi commented 2 years ago

To my knowledge there is currently no way to describe attributes of a dataset (e.g. columns of a CSV). This would include the following information (minimal):

On http://data.stadt-zuerich.ch we provide this information on a dataset level (i.e. it does not differ between distributions).

Example: Daten der Verkehrszählung zum motorisierten Individualverkehr (Stundenwerte), seit 2012

Attributes of a dataset

AFoletti commented 2 years ago

+1 ! 👍 This is to be honest an ongoing discussion of mine with the opendata.swiss team. It is possible (that's the "Data Dictionary" part of the Datastore plugin, which is present on the opendata.swiss CKAN installation) and available in the CKAN edit interface, but not implemented in the frontend GUI. I did not really understand the reason why it's a... aehm... half-assed implementation at the moment. But it's nice to see someone other than me finds it useful 😉

Juan-Juan-1 commented 2 years ago

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

AFoletti commented 2 years ago

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

  • To my knowledge this information is usually not provided through the DCAT-Layer. Do you know of any example?
  • Wouldn't it be more efficient, especially from a user perspective, to provide this information as a separated resource, i.e. downloadble resource?

The information on the page and as downloadable resource (frictionless datapackage or similar) are in my opinion complementary. It is nice for a power user to have the datapackage, but you also have to account for the more casual audience unable to work with such a file. For those, a table with the attributes description and types could do wonders to correctly understand the data. Of course, just my two cents

metaodi commented 2 years ago

I think it's an important part of the metadata to be able to find/search for attributes.

On data.stadt-zuerich.ch all attributes and their descriptions are part of the search index, so you can find a dataset by the description of it's data.

I honestly don't know why this is not part of DCAT so far. But I'm sure this is the reason for it's current implementation on opendata.swiss 😉

Juan-Juan-1 commented 2 years ago

Just fyi: Some data publishers still found a way to somehow bring this information to the users: https://opendata.swiss/de/dataset/covid-19-schweiz But yeah, I agree it should be easier to do and maybe in a more visible fashion.

sabinem commented 2 years ago

@metaodi Your issue really resonates with me, since this was also a question that was sort of always on my mind. I am myself coming from the datascience side and without proper description of the fields, tabular data such as csv files can't really be used for data analysis.

But this issue is not an issue of DCAT-AP CH: it is already build into DCAT, that does not offer any vocabulary in that regard.

Therefore Inspired by your cause, I raised an issue with DCAT to better understand DCAT's reasoning on this. The discussion there might interest you and maybe you also want to join in: https://github.com/w3c/dxwg/issues/1418

Juan-Juan-1 commented 2 years ago

I feel that DCAT doesn't and shouldn't have too much

@metaodi very interesting. This seems to be what in statistics is called https://ec.europa.eu/eurostat/web/sdmx-web-services/data-struct-def. It's definitely worth discussing. A couple of questions:

  • To my knowledge this information is usually not provided through the DCAT-Layer. Do you know of any example?
  • Wouldn't it be more efficient, especially from a user perspective, to provide this information as a separated resource, i.e. downloadble resource?

The information on the page and as downloadable resource (frictionless datapackage or similar) are in my opinion complementary. It is nice for a power user to have the datapackage, but you also have to account for the more casual audience unable to work with such a file. For those, a table with the attributes description and types could do wonders to correctly understand the data. Of course, just my two cents

I agree... DCAT is the upper, "generic" information layer on data (data catalogue vocabulary) with interoperability as a primary goal - it shouldn't go too deep and mix with domain standards like SDMX, FHIR,... it should just reference the necessary information to understand and use data (see for instance https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_conforms_to). Yet having a standardized form to describe and present variables could be really valuable...!

metaodi commented 2 years ago

Comment by @makxdekkers in https://github.com/w3c/dxwg/issues/1418:

I agree with @rob-metalinkage that adding specificity to the 'general' property conformsTois the role of a profile. For example, the European DCAT-AP adds details: for Dataset, it refers to "an implementing rule or other specification" while for Distribution, it specifies "an established schema". Both fit in the general semantics of conformsTo. But if for some reason, an application would find this still too vague -- maybe because a stronger need for validation -- the profile could create subclasses of conformsTo, e.g. conformsToSpec and conformsToSchema.

Maybe then this group could investigate whether there is a set of 'common' subproperties of conformsTofor the description of datasets that could be added to DCAT?

So this could very well be something DCAT-AP Switzerland could define without violating the DCAT Standard.

sabinem commented 2 years ago

What about adding dct:conformsTo as optional or even recommended property on dcat:Distribitution and dcat:Dataset. For users it would be very helpful to have that link to the dataset structure especially on Distributions. On the Dataset level, this could also help to better distinguish geodata by giving them a conformsTo:<https://www.geocat.admin.ch/en/dokumentation/gm03.html> whereas dcat:Datasets get a conformsTo:<https://dcat-ap.ch/>

tlorusso commented 2 years ago

In the current version of the draft i see the 'conforms-to' property only at the dataset level (https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#dataset-conforms-to). Is it planned to add it at the distribution-level too or will it be limited to dcat:Dataset?

sabinem commented 1 year ago

@tlorusso The property conformsTohas been added on both Dataset and Distribution: see here for the property on the class Distribution: https://www.dcat-ap.ch/releases/2.0/dcat-ap-ch.html#distribution-linked-schemas. Hope that answers your question.