netwerk-digitaal-erfgoed / requirements-datasets

Requirements for datasets
https://netwerk-digitaal-erfgoed.github.io/requirements-datasets/
1 stars 0 forks source link

Compound datasets #66

Open coret opened 1 year ago

coret commented 1 year ago

Some dataset publishers want to have a means to indicate that a dataset is part of a whole. Like the Delpher datasets, which consist of metadata, scans and OCR, both public and non-public (each with different license). Or the Adamlink datasets, which consists of datasets about buildings, districts, persons, streets and addresses.

A datasets can have only one license. This is to indicate to the user of the datasets which rights and obligations are applicable. For example, when a dataset has both a https://creativecommons.org/publicdomain/zero/1.0/ and https://creativecommons.org/licenses/by-sa/2.0/ license, it's not clear to the dataset users what license to abide.

Does a "compound dataset" have one distribution, or a distribution of each of the parts? With the later, would is be clear to the users that all distributions need to be downloaded (because usually distributions differ only in encodingFormat, not contents)?

An alternative to provide information about the isPartOf relation is to include the name of the "whole" in each of the "parts", eg. Adamlink Buildings, Adamlink Districts, etc.

Another alternative is to make separate DataCatalogs. Eg. you could define an AdamLink data catalog, next to other Amsterdam Timemachine DataCatalogs (dito for Delpher).

Be ware: you can't have DataCatalogs in your DataCatalog, so each of them has to be "promoted" (and registered with the Dataset Register).