Open ddeboer opened 2 years ago
Before that we have to decide which party makes the decision that datasets should be marked for harvest by CLARIAH or which scenarios we need to support. I see three possibilities:
- CLARIAH decides which datasets is of interest to them
And administer these markings in the CLARIAH infrastructure?
- institutes promote their dataset to be of interest for the CLARIAH user group
This would mean the source is changed, directed by our adjusted requirements (besides the includeInEuropeana?). Although I like this "at the source" option, I do wonder how many institutions will use an includeInClariah-like predicate. And, can a dataset supplier decide that it is a relevant CLARIAH dataset, or will the promotion be handled as a suggestion by CLARIAH?
- the Datasetregister team decides which group of datasets could be relevant for use within CLARIAH (in close coordination with CLARIAH)
I do not like the extra manual work these annotations would mean for the team. And bear in mind, you'd have to 'judge' every new (set of) datasets. NB: the same of course true for the first option (CLARIAH).
@coret and I will think about a way to describe this in the dataset description RDF and requirements.
Proposal:
<dataset> schema:audience <https://www.europeana.eu> ,
<https://clariah.nl> ,
<https://www.collectienederland.nl> .
In DCAT/DCT that would be dct:audience
.
We should document the enum of audiences in our requirements.
Implementation strategy:
schema:audience
and phase out the "audience" graph (=shift of respondibility)We have decided for now to publish only KB, B&G and IISG because these are CLARIAH partners.
How can we annotate dataset descriptions in the NDE Registry to make it clear they should be harvested by CLARIAH?
Currently CLARIAH harvests all datasets for a selection of publishers. The publisher selection is configured on the CLARIAH side. Is having all datasets harvested the desired behaviour?
See https://github.com/CLARIAH/clariah-plus/issues/97.