w3c / rch-wg-charter

Charter proposal for an “RDF Dataset Canonicalization and Hash Working Group”
https://w3c.github.io/rch-wg-charter/
Other
12 stars 7 forks source link

Consider defining a canonical serialization to bytes, rather than a hash #95

Closed jyasskin closed 2 years ago

jyasskin commented 2 years ago

Different systems that need to hash or sign an RDF dataset are likely to have different requirements on the security properties of those hashes and signatures. Picking a single hash function (the draft charter mentions BLAKE3 or SHA-3) will nail down a particular set of security properties rather than letting the embedding system decide. I think the RDF-specific part is about defining the series of bytes that represents the dataset, and then those bytes can be fed into any hash or signature algorithm.

I don't feel strongly about changing this, but it might reduce the amount of arguing you have to do later.

iherman commented 2 years ago

@jyasskin I do not think we have any disagreement on this, but I wonder whether there is a need for any change on the charter text.

The only specific reference to hash functions is in the following sentence:

This Working Group will only define the usage of algorithms such as BLAKE3 or SHA-3.

(Emphasis is mine). Actually, this item appears in the "Out of Scope" section of the charter.

The deliverable description of hashing carefully avoids making any kind of commitment to any hash functions or even whether it is on characters or bytes or anything else. (We had this type of discussion a few weeks ago in #92 and in #93, which led to the current formulation.)

The section on hash in the explainer text also uses the word "may" on the description (which only describes the approach taken by the community at this time), and it also says:

That being said, there may be other approaches to define a hash that do not necessarily involve a sorted N-Quads representation: the Working Group will have to determine the best approach.

In light of these, do you think there is still a need for improvement somewhere?

jyasskin commented 2 years ago

The charter appears to commit to producing a hash function. I agree it's good that the charter doesn't commit to which underlying hash function, but defining "a hash function" for datasets appears to require picking out just one underlying hash function.

Changing "This specification details the processing steps of a hash function for an arbitrary RDF Dataset" to "This specification defines how to apply a hash function [or signature algorithm?] to an arbitrary RDF Dataset" might fix the problem by implying that the choice of hash function is up to the caller?

pchampin commented 2 years ago

I like @jyasskin's proposed phrasing, it is actually well aligned with what this deliverable was about in my (and, I believe, others') view.

Now, about the parenthesis "[or signature algorithms?]"... :smiling_imp: Clearly, we do not want to mention "signature", as this is what ignited a heated debate with the previous incarnation of this charter.

However, maybe we can leave a door ajar here, e.g. with "or another cryptographic primitive". Maybe this could pave the way for anyone planning to add a signature mechanism on top of this WG's work (e.g. the VC WG)?

msporny commented 2 years ago

I think the RDF-specific part is about defining the series of bytes that represents the dataset, and then those bytes can be fed into any hash or signature algorithm.

I understand @jyasskin's concern, and I don't think we ever intended the text to be interpreted in the way that he is (even though I understand why he's interpreting it in that way).

The output for RDF Dataset Canonicalization is a series of bytes, namely, it's a serialized list of NQuads (again, bytes). That list of NQuads might then be passed through any arbitrary hash function. It was never the intent to identify which hash function as that's, as @jyasskin rightly puts it, a decision that should be made higher up in the stack (e.g., a signature cryptosuite defined by the VCWG, an implementer that has a specific hashing requirement, etc.).

Just writing this down to make sure we're all on the same page here.

pchampin commented 2 years ago

@jyasskin I believe that we can close this, your change suggestion being implemented by PR #104. Do you agree?