<http://data.bibliotheken.nl/id/dataset/rise-alba> a void:Dataset;
void:distinctSubjects 53434;
void:distinctObjects 32323;
void:properties 943;
void:entities 8493. # To be an entity in a dataset, a resource must have a URI, and the URI must match the dataset's void:uriRegexPattern, if any.
[] a void:Linkset;
void:subjectsTarget <http://data.bibliotheken.nl/id/dataset/rise-alba>;
void:objectsTarget <http://data.bibliotheken.nl/id/dataset/persons>;
void:subset <http://data.bibliotheken.nl/id/dataset/rise-alba>; # The dataset that contains the links.
void:triples 434 .
[] a void:Linkset;
void:subjectsTarget <http://data.bibliotheken.nl/id/dataset/rise-alba>;
void:objectsTarget <https://data.cultureelerfgoed.nl/term/id/cht>;
void:triples 9402.
Use a list of fixed URI prefixes to match against, from the Network of Terms and in addition a custom list in the pipeline itself.
Vocabularies
<http://data.bibliotheken.nl/id/dataset/rise-alba> a void:Dataset;
void:vocabulary <https://schema.org/>, <http://www.w3.org/2000/01/rdf-schema#>, <http://xmlns.com/foaf/0.1/>.
Example resources
<http://data.bibliotheken.nl/id/dataset/rise-alba> a void:Dataset;
void:exampleResource <http://data.bibliotheken.nl/doc/alba/p418213178>,
<http://data.bibliotheken.nl/doc/alba/p416673600>.
Provenance
Where should we place provenance information about the analysis results? PROV-O suggests using prov:Entity for analyses. We can track provenance either:
at the level of void:Dataset by declaring each dataset to be a prov:Entity too, which is rather vague;
or, more precisely, at the level of each partition (itself a void:Dataset).
<http://data.bibliotheken.nl/id/dataset/rise-alba> a void:Dataset;
void:classPartition [
void:class schema:VisualArtWork;
void:entities 312000;
a void:Dataset, prov:Entity;
prov:wasGeneratedBy [
a prov:Activity ;
prov:used "SELECT DISTINCT ?type (COUNT(?type) as ?number) (…)";
];
prov:generatedAtTime "2022-05-03T13:35:23Z"^^xsd:dateTime;
].
Questions
[ ] Are there other relevant prov properties that we should add?
[ ] Are these data structures easy enough to query for clients?
[x] Does void make sense or should we (also) use schema where possible, e.g. schema:workExample?
[x] Are void:Linksets a good idea or should we have a simple list of source/count pairs?
This is a proposal for what the dataset summaries could look like. This proposal is based on https://www.w3.org/TR/void/#statistics.
Dataset summaries
Size
Classes
Properties
Property density per subject type
Nest a
void:propertyPartition
invoid:classPartition
:Outgoing links
We could model these as void:Linksets:
Use a list of fixed URI prefixes to match against, from the Network of Terms and in addition a custom list in the pipeline itself.
Vocabularies
Example resources
Provenance
Where should we place provenance information about the analysis results? PROV-O suggests using prov:Entity for analyses. We can track provenance either:
Questions