w3c / vc-data-model

W3C Verifiable Credentials v2.0 Specification
293 stars 106 forks source link

Clarify hashing algorithm used for relatedResource examples #1570

Open lemoustachiste opened 2 weeks ago

lemoustachiste commented 2 weeks ago

In this section: https://www.w3.org/TR/vc-data-model-2.0/#integrity-of-related-resources, the vc v2 context https://www.w3.org/ns/credentials/v2 has this corresponding digestMultibase value: uEres1usWcWCmW7uolIW2uA0CjQ8iRV14eGaZStJL73Vz

I am trying to implement logic in my verifier to check the assertion, however using js-multiformat (https://www.npmjs.com/package/multiformats) I cannot find the correct steps to manage the same result.

Is the hashing algorithm for contexts and other resources detailed somewhere (https://www.w3.org/TR/controller-document/#algorithms maybe)? Or better yet, is there an available library to do this conversions?

All in all I think adding a link to the expected algorithm in the vc-model spec would greatly help implementers konw how to get back to expected values.


lemoustachiste commented 2 weeks ago

FWIW, I tried running the algorithm here: https://www.w3.org/TR/controller-document/#example-an-implementation-of-the-general-base-encoding-algorithm-above-in-javascript, but while I get the same result as with this naïve script:

  const bytes = json.encode(vcV2Context);
  const sha384Digest = await sha384.hash(bytes);
  const computedDigest = base64url.encode(sha384Digest);

  // https://www.w3.org/TR/vc-data-model-2.0/#integrity-of-related-resources
  const targetDigest = 'uEres1usWcWCmW7uolIW2uA0CjQ8iRV14eGaZStJL73Vz';

That result does not equate to the targetDigest

msporny commented 1 week ago

All in all I think adding a link to the expected algorithm in the vc-model spec would greatly help implementers konw how to get back to expected values.

We recently moved the definition of Multihash from Data Integrity to the Controller Document, specifically, this section:


Would referring to that section help?

lemoustachiste commented 1 week ago

Hi @msporny,

I'm sorry I must be missing something in the trail of information.

The digestMultibase of the VC v2 context is defined here: https://www.w3.org/TR/vc-data-model-2.0/#example-use-of-the-relatedresource-and-digestmultibase-properties as uEres1usWcWCmW7uolIW2uA0CjQ8iRV14eGaZStJL73Vz.

As it starts with a u I assume the encoding is base64-url as defined here https://www.w3.org/TR/controller-document/#multibase-0. I checked the expected alphabet and it matches that of Digital Bazaar's library base64url-universal (https://github.com/digitalbazaar/base64url-universal/blob/main/lib/base64url.js#L13), so I believe I am using the correct tool for the last step.

So I guess my issue is with the hashing part. In this section https://www.w3.org/TR/vc-data-model-2.0/#defn-relatedResource, it says that SHA-384 should be preferred, so that's what I'm using (SHA2).

I get the following byte array when hashing VC V2 context with SHA-384, consistently with 2 different libraries:

Uint8Array(48) [
  229,  13, 164, 166, 212, 164,  82, 226, 235,
   28,  65, 158, 115, 116, 131, 118, 235, 106,
   86,  82, 124,  58, 252,  21, 169,  79,  98,
  246, 242, 196, 175,  45, 183,  96, 153, 169,
   41, 122, 101, 102,  10,  90, 114, 126,  98,
   59, 242, 119

But using the DB's library to encode this hash to base64url, I get the following result: 5Q2kptSkUuLrHEGec3SDdutqVlJ8OvwVqU9i9vLEry23YJmpKXplZgpacn5iO_J3 which does not match the value and header specified in the spec doc.

I'm missing a piece of the puzzle and I can't figure out what it is. Can you guide me to your result?


msporny commented 1 week ago

I'm missing a piece of the puzzle and I can't figure out what it is. Can you guide me to your result?

I expect that the Multihash values are base64-url encoded SHA-256 values, not SHA-384. I'll try to check here in a bit... it might also be that I messed up generating the example since this part of the spec is fairly new and hasn't gotten much review.

There is a disagreement in the WG as to whether SHA-384 really should be the suggested target hash value. The digestSRI folks seem to want 384, but many of the others in the WG don't feel like that's necessary given that the energy required to break SHA-256 is the equivalent of repurposing every computer on the planet for billions of years or boiling every ocean on the planet 16,000+ times over. That said, I don't think the VCWG has it in them to continue to debate the point more than it already has.

msporny commented 1 week ago

I'll try to check here in a bit...

Yeah, the test vectors were wrong/old. :(

I updated the spec to auto-generate the values instead of depending on humans to do it. The Multibase and Multihash encodings are also now explicitly specified in each example title. The commit that implemented that is here: 04576fa3c0c102f967f1bf053a2982c9251ca435

Does that address your concerns @lemoustachiste ?

lemoustachiste commented 1 day ago

@msporny yes I think that's way clearer like this, thanks...

...but I am still unable to find the correct algorithm to produce the same result as to what's displayed... :/. Do you have a link to the script that handles the hashing and the encoding?

msporny commented 1 day ago

Do you have a link to the script that handles the hashing and the encoding?

Yes, here's the code that's running in the spec:


There is a chance that there is a bug in my code, so I'm very interested to hear if you're able to reproduce the output. Ideally, we'd need to figure this out by the end of this year to ensure that the correct test vectors are in the specification. Thank you for looking into this, independent verification of these test vectors are very important!