netwerk-digitaal-erfgoed / requirements-datasets

Requirements for datasets
https://netwerk-digitaal-erfgoed.github.io/requirements-datasets/
1 stars 0 forks source link

Add specific SPARQL distribution info #76

Open coret opened 1 year ago

coret commented 1 year ago

The current specification lacks a method to better specify a SPARQL endpoint.

Some datasets (like those of the Literatuurmuseum) all have the same schema:contentUrl (https://LIT.hosting.deventit.net/AtlantisSparql). But each of the datasets is "stored" in a separate graph. A property - with working name nde:graphUri - should be defined to provide the URI of the graph.

Some datasets (like those of the KB) require a filter on the subject URIs, like ?subject schema:mainEntityOfPage/schema:isPartOf <http://data.bibliotheken.nl/id/dataset/dbnla>. A property - with working name nde:subjectFilter - should be defined to provide a piece of SPARQL select.

This information is needed for automatic processing of SPARQL-endpoints, for example to use in the Dataset Register KG. See https://github.com/netwerk-digitaal-erfgoed/kg-prototype/blob/master/catalogs/sparql-endpoints.ttl for examples.

TODO: investigate if appropiate properties are available in VOID or DCAT.

ddeboer commented 7 months ago

Another example is https://datasetregister.netwerkdigitaalerfgoed.nl/show.php?uri=http://data.beeldengeluid.nl/id/dataset/0028, which has as its distribution access URL: https://cat.apis.beeldengeluid.nl/sparql?query=PREFIX%20sdo%3A%20%3Chttps%3A//schema.org/%3E%20SELECT%20DISTINCT%20%3FprogramUri%20%3FprogramName%20WHERE%20%7B%3Fseries%20sdo%3Aname%20%22Muziekopnamen%20Zendgemachtigden%20%28MOZ%29%22%5E%5Exsd%3Astring%20.%20%3FprogramUri%20sdo%3ApartOfSeason/sdo%3ApartOfSeries%20%3Fseries%20%3B%20sdo%3Aname%20%3FprogramName%20.%20%7D

This makes it very hard for us to work with the query from the Knowledge Graph.

ddeboer commented 7 months ago

TODO: investigate if appropiate properties are available in VOID or DCAT.

From the VoID spec:

Note: In some SPARQL endpoints, named graphs are used to partition the data. Currently VoID doesn't provide a dedicated way of stating that a dataset is contained in a specific named graph. This kind of information can be provided in a SPARQL Service Description, as described below.