netwerk-digitaal-erfgoed / dataset-register

Components (API and crawler) for the NDE Dataset Register
https://datasetregister.netwerkdigitaalerfgoed.nl/api/
European Union Public License 1.2
4 stars 3 forks source link

Store schema:includedInDataCatalog (when omitted in dataset description) #564

Open coret opened 2 years ago

coret commented 2 years ago

The Dataset Register handles Datasets and DataCatalogs. When handling a DataCatalog all the descriptions of datasets are "extracted" and stored. The fact that a Dataset was part of a DataCatalog is only stored in the Dataset Register if the (recommended, but not required) schema:includedInDataCatalog property was provided in the dataset description.

In cases where dataset descriptions in a DataCatalog do not provide the (reverse) schema:includedInDataCatalog property, the Dataset Register could add this property to the dataset description, as this can be valuable information.

ddeboer commented 2 years ago

To be clear: this will only work when a catalog is registered, not an individual dataset.

How should we model this relation in DCAT?

  1. <catalog> dcat:dataset <dataset>?
  2. Or do we want this on the dataset level? In that case, what is the inverse of dcat:dataset (in other words, the equivalent of schema:includedInDataCatalog)? dcat:catalog is weird because its domain is dcat:Catalog rather than dcat:Dataset.

Should we also store the catalog itself with any metadata provided by the user?

coret commented 2 years ago

To be clear: this will only work when a catalog is registered, not an individual dataset.

Correct

How should we model this relation in DCAT?

I'm inclined to say option 2 as the Dataset Register is about dataset(description)s. I think dcat:catalog (BTW: DCAT 2 or DCAT3?) is equivalent with schema:includedInDataCatalog (which has schema:DataCatalog as domain).

Should we also store the catalog itself with any metadata provided by the user?

The catalog (if provided or linked via schema:includedInDataCatalog in dataset *) might contain interesting metadata, especially in the light of the need for some organisation to provide information about some kind of compound dataset.

Is crawling, validating, storing and querying datacatalogs straightforward?

*) I just realized that we could use schema:includedInDataCatalog as a discovery mechanisme (in case a dataset is registered and this property is present). But maybe organization deliberately only provide some datasets as they are heritage specific and other (from the catalog) are not...

ddeboer commented 2 years ago

I'm inclined to say option 2 as the Dataset Register is about dataset(description)s. I think dcat:catalog (BTW: DCAT 2 or DCAT3?) is equivalent with schema:includedInDataCatalog (which has schema:DataCatalog as domain).

Both in DCAT 2 and 3 dcat:catalog has domain dcat:Catalog, so can only be applied to dcat:Catalog (which is a subclass of dcat:Dataset). So we’re still looking for the inverse of dcat:dataset, of which the DCAT spec says:

However, recognizing that inverses are needed for some use cases, DCAT supports them, but with the requirement that they MAY be used only in addition to those described in 6. Vocabulary specification, and that they MUST NOT be used to replace them.

It mentions dcat:inCatalog there, which seems to be what we’re looking for, although it should be used only in addition to dcat:dataset.