w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
139 stars 55 forks source link

Represent dataset encryption algorithm in DCAT #1457

Open garfi303 opened 2 years ago

garfi303 commented 2 years ago

If you are submissing a new USE CASE, please use the template below. Otherwise, delete the use case template before submitting your contribution.


Add encryption Algorithm to DCAT.v3

Status:

Identifier:

Creator: Simson Garfinkel

Deliverable(s): DCAT3

Tags

Stakeholders

Data producer and data curator.

Problem statement

With the anticipated move to post-quantum encryption algorithms in the coming years, it is necessary to capture the encryption algorithm used for every dataset to determine if the algorithm meets organizational requirements for being a quantum-secure encryption algorithm or a quantum-vulnerable. For example, AES-128 is considered to be quantum-vulnerable, while AES-256 is considered to be quantum secure. All RSA algorithms are considered to be quantum-vulnerable.

Currently DCAT has no way of representing this information.

Existing approaches

None known by @garfi303.

Links

Requirements

There should at least be a text field for specifying encryption algorithm, but there should probably be a taxonomy of them.

Related use cases

Comments

I'm new here, so it's likely I didn't put this issue in the correct form. Apologies.


andrea-perego commented 2 years ago

@garfi303 , thanks for submitting this use case.

DCAT relies on the SPDX vocabulary to specify checksums, as well as the algorithms used:

https://www.w3.org/TR/vocab-dcat-3/#Property:checksum_algorithm

Indicating the security level of a hash algorithm is something that we have not considered in the scope of DCAT. This should rather be addressed by SPDX or a dedicated vocabulary.

simsong commented 2 years ago

Hi @andrea-perego . Than you for responding to my query. I reviewed SPDX and see that it has checksum algorithms, but not encryption algorithms. We need to be able to indicate that data at rest is encrypted with AES-128 or AES-256, for example. We are not trying to indicate the 'security level.'

andrea-perego commented 2 years ago

Sorry for the misunderstanding, @simsong .

To better understand your requirements, could you please provide an example?

E.g., besides the encryption algorithm, do you need to specify any other information? Also, is encryption applied on both data and checksums?

simsong commented 2 years ago

I can refer you to the NIST Post-Quantum Cryptography FAQ.

SQLite3 supports four encryption modes:

So it would be useful to know if an SQLite3 database has encrypted data in AES-256 OFB mode, which would be quantum-resistant, and AES-128 OFB, which is not quantum-resistant. It would be nice to be able to document this in DCAT.

There are other examples.

For example, S3 allows AES-256 for server-side encryption, but for client side-encryption it can be AES-128 or AES-256: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-aws-nar/1.11.4/org.apache.nifi.processors.aws.s3.encryption.StandardS3EncryptionService/index.html

It would be useful to be able to document in DCAT which encryption mode is used.

Checksums are encrypted, but they are encrypted with asymmetric algorithms (public key algorithms) with private keys so that they can be decrypted with public keys. This is how digital signatures work.

Checksums may also include a nonce. This is how HMACs work.

If you wish, I can provide some references to public key cryptography. However, that would be out of scope with this request.

andrea-perego commented 2 years ago

Thanks, @simsong .

So, if I got it right, your use case requires:

  1. A property to specify the encryption algorithm used on a distribution or on a checksum
  2. A code list / taxonomy of encryption algorithms
simsong commented 2 years ago

Close. Here is better:

  1. A property to specify the encryption algorithm used on a distribution.
  2. A code list / taxonomy of encryption algorithms

We do not need to be able to represent encryption of checksums. We should have the ability to represent digital signatures, but I am assuming (hopefully correctly) that there is a way to represent digital signatures, although I do not know if there is or is not.

Thanks again.

Simson

bertvannuffelen commented 2 years ago

@simsong

would a codelist corresponding to https://www.rfc-editor.org/rfc/rfc7518 fit your objectives?

kr,

Bert

simsong commented 2 years ago

@simsong

would a codelist corresponding to https://www.rfc-editor.org/rfc/rfc7518 fit your objectives?

kr,

Bert

Yes! it would be great if we could just use RFC7518, but I guess it would be necessary to do some sort of JSON->RDF conversion?

bertvannuffelen commented 2 years ago

In an ideal world I would love them to have ;-)

In this rfc also an IANA registry is mentioned: https://www.iana.org/assignments/jose/jose.xhtml#web-signature-encryption-algorithms. In there, this W3C group is mentioned: https://www.w3.org/TR/WebCryptoAPI/#algorithm-concepts

Could you investigate how both could be connected? Lets connect the worlds instead of reinventing new stuff. It would be better that together with them a simple extension of DCAT with security concerns is created.

davebrowning commented 1 year ago

Project/Milestone modified.

Explanation: As DCAT v3 moves through review and hopefully ratification, we want to make sure that open issues and feedback that have yet to be completely addressed are properly recorded and tagged/assigned in github to both clarify their status and to help review and prioritise as a source of improvements and new requirements in future DCAT versions