opendata-swiss / dcat_ap_ch

Examples for geocat and DCAT data-catalogs are given here
5 stars 3 forks source link

Clarify requirements for data receivers: they should always be able to pass through DCAT-AP classes and properties correctly #119

Open sabinem opened 3 years ago

sabinem commented 3 years ago

Property: all properties of DCAT-AP that are not included in DCAT-AP CH Class: all classes of DCAT-AP that are not included in DCAT-AP CH Conformance Problem: Data receivers(e.g. opendata.swiss) who have DCAT-AP-CH as a reference should still be able to pass through well formed DCAT-AP catalogues, even if some properties or classes are not directly defined in DCAT-AP CH. In particular, data exported toward data.europe.eu should match the import of a data provider who chooses to use further DCAT-AP classes or properties. This issue has been discussed and clarified here: SEMICeu/DCAT-AP#198 Proposal:

Possible consequences on opendata.swiss:

AFoletti commented 2 years ago

I am not sure I correctly understand the proposal. Are we going toward a parallel management of "proper DCAT-AP" metadata versus "special DCAT-AP CH" metadata?

takohaller commented 2 years ago

Is this an issue regarding the specification of DCAT-AP-CH, or "just" a recommendation how the standard needs to be implemented on opendata.swiss?

Juan-Juan-1 commented 2 years ago

@AFoletti sorry I am not sure I understand your question... Could you please help me there? Thanks!

Juan-Juan-1 commented 2 years ago

@takohaller while it's a recommendation on how to implement DCAT-AP-CH on opendata.swiss, it's also an important general information for data publishers and even for other portals. I think we could generalize the issue a little bit, transform it in something like "better communication of the requirements for compatibility".

AFoletti commented 2 years ago

@AFoletti sorry I am not sure I understand your question... Could you please help me there? Thanks!

Let my question be, it was poorly formulated 😄 . But I need clarifications on the proposal because, as it is written right now, I am unsure about where we are aiming at.

metaodi commented 2 years ago

I think the only way to do this properly with CKAN is to store the original graph when importing (e.g. the RDF as a string), and then combine the imported (and possibliby altered) values with the original graph to pass them on.

Otherwise you would need a triple store as a backend, that fully represents the imported data.

I think the ability to pass through DCAT catalogues is very important and I support all efforts in this direction. Although I think this is not an issue with DCAT-AP but only with it's implementation, it is a good idea to state the expectation in the standard.

sabinem commented 2 years ago

@AFoletti it is exactly as @metaodi has explained: a data portal such as opendata.swiss is not meant to alter the data, but harmonize it and otherwise pass it through as it is. This is one task that it has. The other task is of course to make the metadata discoverable on its level of aggregation. But it is overall more a concept of pass and use and not a concept to altering the structure of the metadata.

In order to put that into practice the imported data needs to be stored somewhere. CKAN makes that a little hard and a triple store would be best for that. For example the German opendata portal already has a triple store and they are very happy with it. Until opendata.swiss has a triple store itself the only other way to do this, is as @metaodi explained to store the original graph somewhere and then use this stored graph for the export to the European data portal.

@metaodi I agree that this expectation should be written up in the standard: this here is the chapter to do this: https://dcat-ap.ch/releases/2.0/dcat-ap-ch.html#receiver-requirements.

To me there is also the question of which catalog should be exported: currently we import catalogs, store the datasets and distributions. Then on export we take allt the datasets and distributions and build a new catalog with them. So we are not passing through catalogs, but rather datasets. Passing through several catalogs instead exporting just one catalog would be a major change. On the other hand if we only pass through datasets, structures such as a catalog record and properties stored on the catalog, would get lost.

DCAT-AP says on this:

"In order to conform to this Application Profile, an application that receives metadata MUST be able to: • Process information for all classes specified in section 3. • Process information for all properties specified in section 4.
• Process information for all controlled vocabularies specified in section 5.2. As stated in section 3, "processing" means that receivers must accept incoming data and transparently provide these data to applications and services. It does neither imply nor prescribe what applications and services finally do with the data (parse, convert, store, make searchable, display to users, etc.).

So I think it still needs to be discussed further what exactly "transparently provide" would mean in the above case.

sabinem commented 2 years ago

I found my own answer in this regard: when a receiver such as opendata.swiss exports its metadata, it shouldn't just export the datasets and make a new catalog of these datasets. I should rather use the property of dcat:Catalog: hasPart to bundle the imported catalogs together to one bigger catalog. See the properties of DCAT-AP below:

Screenshot from 2021-11-23 08-59-09