netwerk-digitaal-erfgoed / dataset-register

Components (API and crawler) for the NDE Dataset Register
https://datasetregister.netwerkdigitaalerfgoed.nl/api/
European Union Public License 1.2
4 stars 3 forks source link

Discover datasets #36

Open ddeboer opened 3 years ago

ddeboer commented 3 years ago

Based on:

coret commented 3 years ago

Proposed well-known-URI for datacatalogs (inspired by https://www.w3.org/TR/void/#well-known) for inclusion in Requirements for Datasets:

Discovery with well-known URI

The RFC 5785 defines a mechanism for reserving 'well-known' URIs on any Web server.

The URI /.well-known/datacatalog on any Web server is registered by this specification for a datacatalog with dataset descriptions of datasets hosted on that server. For example, on the host www.example.com, this URI would be http://www.example.com/.well-known/datacatalog.

This URI may be an HTTP redirect to the location of the actual datacatalog file. The most appropriate HTTP redirect code is 302. Clients accessing this well-known URI MUST handle HTTP redirects.

The datacatalog file accessible via the well-known URI should contain descriptions of all datasets hosted on the server. This includes any datasets that have resolvable URIs, a SPARQL endpoint, a data dump, or any other access mechanism whose URI is on the server's hostname. Datacatalogs can be described using http://www.w3.org/ns/dcat#Catalog or https://schema.org/DataCatalog.

This document defines the “.well-known” URI datacatalog using the registration procedure and template from Section 5.1 of RFC 5785 as follows:


URI suffix:
    datacatalog
Change controller:
    W3C
Specification document(s):
    This document.

Example (for testing):

curl -I https://www.openarch.nl/.well-known/datacatalog
HTTP/2 302
location: https://www.openarch.nl/datasets/

Impact on the Register function (Design):

EnnoMeijers commented 3 years ago

Please note that .well-known suffix 'datacatalog' is not a registered suffix, see https://www.iana.org/assignments/well-known-uris/well-known-uris.xhtml. Do we know what consequences using an unregistered suffix might have? Are we breaking standards when using an non registered suffix?

coret commented 3 years ago

Next step - if there are no comments on the text - is to include the proposed text into the our requirements document so we have a referencable document. Then we can send a request to have 'datacatalog' included in the list via https://github.com/protocol-registries/well-known-uris

EnnoMeijers commented 3 years ago

I think for a formal registration of the 'datacatalog' suffix we should seek broader support in the (DCAT) community as it makes no sense, and probably has little chance to succeed, to do this only from the Dutch Digital Heritage perspective. Maybe we could consult Ruben Verborgh, Antoine Isaac or Herbert Van de Sompel as they have been involved in the Dataset Exchange Working Group (see https://www.w3.org/2020/02/dx-wg-charter.html) in one way or the other.

coret commented 3 years ago

I have posted the issue Improve discovery of datacatalogs by registering well-known suffix 'datacatalog' at https://github.com/w3c/dxwg/issues/1290 and https://github.com/schemaorg/schemaorg/issues/2827 to seek support of these communities.