w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 55 forks source link

Support an indication of sharebility status #1432

Open garfi303 opened 2 years ago

garfi303 commented 2 years ago

Sharebility status

Status: open

Identifier:

Creator: @garfi303 (Simson Garfinkel)

Deliverable(s): DCATv3

Stakeholders

Users of DCAT within large organization.

Problem statement

We are using DCAT for representing our internal organization data catalog. Some users have data that is not sharable but needs to be indicated in the catalog Others have data that is not ready for sharing but will be sharable one day. So we would like to have a DCATv3 field that indicates the Sharability status, and this could include:

I'm not sure how to represent this and don't think that it can be represented at the moment.

Existing approaches

DCAT-US has a field called accessLevel that must be public, restricted public, or non-public, but this is not sufficiently rich.

Links

n/a

Requirements

We have a mandatory requirement for this, so we will either use what DCATv3 does, or we will develop our own.

Related use cases

''Optional references to related local (refer to anchor identifier [[#Id...]]) and remote use cases (e.g. POE-WG UCs)''.

Comments

''Optional section for editorial comments, suggestion and their interactive resolution''


andrea-perego commented 2 years ago

Thanks for sharing this use case, @garfi303 .

Currently, DCAT makes use of dcterms:accessRights for these purposes, but it does not mandate a specific code list (§8 License and rights statements).

For the discussion underlying this approach, see https://github.com/w3c/dxwg/issues?q=is%3Aissue+is%3Aclosed+label%3Adct%3AaccessRights

Could you review the above, and indicate how DCAT should be possibly revised to address your requirements? E.g., what is needed is just a specific code list?

simsong commented 2 years ago

dcterms:accessRights does not appear to have sufficient fidelity for our needs. We may have data that is legally publicly sharable but for which the organization is not yet prepared to share. It is that status of readiness that we need to document.

makxdekkers commented 2 years ago

@simsong In what way does dct:accessRights have insufficient 'fidelity'? Its definition is "Information about who can access the resource or an indication of its security status" which is fairly broad. Would it not be possible to satisfy your requirement by an appropriate controlled vocabulary for this property?

simsong commented 2 years ago

The issue is that there are two different kinds of information that we want to represent. One with whom the dataset may be shared, the second is whether or not the dataset is ready to be shared. My understanding is that when you have two different kinds of information, it's better to represent them in two different kinds of elements, rather than combining them into one. Is that not the case here?

dr-shorthair commented 2 years ago

If you have very specific needs, then you may need to specify a 'profile' of DCAT for your community. i.e. a metadata standard that is conformant to DCAT but introduces additional constraints, such as use of a specific set of licenses or usage rights.

simsong commented 2 years ago

If you have very specific needs, then you may need to specify a 'profile' of DCAT for your community. i.e. a metadata standard that is conformant to DCAT but introduces additional constraints, such as use of a specific set of licenses or usage rights.

We intend to do that if there is no interest in adding this level of description to DCATv3. However, I think that these are things that many organizations need to document, and I believe that it's better to document everything in one place in a consistent, standards-based approach, rather than having different organizations develop their own approaches.

kcoyle commented 2 years ago

Would dct:available help for "Expected to be sharable by..."? It is defined as: "Date that the resource became or will become available." It can take a future date.

The others seem to be access rights, as per dct:accessRights, which can have a standard list of values, and of course you can create your own standard list.

bertvannuffelen commented 2 years ago

In DCAT-AP there is the property dcatap:availability defined for distributions. Having now as just published controlled vocabulary https://op.europa.eu/en/web/eu-vocabularies/concept-scheme/-/resource?uri=http://publications.europa.eu/resource/authority/planned-availability

W.r.t. accessRights, the DCAT profile for Flanders has identified that there is a difference between the accessrights to the dataset versus the accessRights to the service providing access to the data. The first is a legal right: by law it is public data, or it has restricted access. In government context, this accessrights is connected with Public Information transparency (PSI directive etc.) But that right to be public is not the same for a service. A public API is a service for which no check happens to verify whether the user of that service has the right to access the data. This still can mean there are security and access monitoring, but solely with the purpose to have fair use of the service. E.g. to prevent DOS attacks. A non-public API is then a service where the security and access monitoring is connected with a granting system checking conditions (age/gender/profession/kind of user/...).

Observe that these are different interpretations of access per kind of resource: The first is bound to inherit legal basis, while the second is more about the measures taken to make the service operational. Note that for file-based access this distinction is also present in accessURL / downloadURL. The first is used when the second (no limitations) cannot be fulfilled.

andrea-perego commented 2 years ago

@simsong , unless you have any additional comments, we will consider this issue as closed.

simsong commented 2 years ago

Respectfully, this is not about accessRights. The issue is not whether the data is open or restricted. The issue is not classification level. The issue is not how the data is being made available over the internet (accessURL/downloadURL). The issue is not if the data is available or not.

The issue is whether the file, stored on a disk, is encrypted with AES-128 or AES-256.

This is a very specific technical issue that is involved with the transition from current encryption algorithms to quantum-resistant algorithms. It is of broad interest to any organization that stores data encrypted. It may be covered under the current taxonomy, but it does not seem to be covered by the ones you cite above.

riccardoAlbertoni commented 1 year ago

After some discussion in tonight call, we decided we move this issue for the next standardization round. See resolution https://www.w3.org/2023/01/24-dxwgdcat-minutes.html#r03

davebrowning commented 1 year ago

Project/Milestone modified.

Explanation: As DCAT v3 moves through review and hopefully ratification, we want to make sure that open issues and feedback that have yet to be completely addressed are properly recorded and tagged/assigned in github to both clarify their status and to help review and prioritise as a source of improvements and new requirements in future DCAT versions