w3c / vc-data-integrity

W3C Data Integrity Specification
https://w3c.github.io/vc-data-integrity/
Other
42 stars 19 forks source link

Determine how cryptographic suites are named and versioned #38

Closed msporny closed 1 year ago

msporny commented 2 years ago

There is a section in the specification named "Versioning Cryptography Suites" that attempts to specify how cryptographic suites are versioned. The options that are being considered are: parameters (RSASSA-PKCS1-v1_5-SHA1), numbers (ecdsa-v1), and dates (ecdsa-2022).

There is disagreement on which mechanism to use. This issue tracks that debate such that we can decide which type of identifiers we'd like to use for the cryptography suites.

Wind4Greg commented 1 year ago

Cryptosuite Naming Approach?

The cryptosuite property, DI Section 3.1 DataIntegrityProof contains a string specifying the name of the cryptosuite. This is an important option and impacts all the test vectors that I've generated for the EdDSA and ECDSA specifications. Cryptosuite naming is partially discussed in Section 5.1 versioning cryptographic suites

We currently have a cryptosuite naming convention aimed at indicating the essence of the algorithm and its rough timeliness (in years). We assume a transformation approach in that naming and add information in case of a variant, i.e., "eddsa-2022" uses RDF canonicalization while "jcs-eddsa-2022" uses JCS canonicalization. Details of hashes used are not part of the naming convention. Nor, in the case of VC-DI-ECDSA, do we indicate the choice of curves (P-256 or P-384). In the ECDSA case the curve used can be inferred from the type of the public key.

For internal use in the creation and identification of cryptographic algorithms at a level below VC "ciphersuite ids" are used. These can be quite long and include almost all relevant technical details. Below I give examples from the BBS DIF/IETF draft. This doesn't seem appropriate for VC-DI suites.

Decision points:

Current draft cryptosuite strings:

Notes:

msporny commented 1 year ago

One of the design goals for naming the cryptosuites was to reduce the possibility of developers picking the wrong/weird options. The ideal that we wanted to get to (which is probably not achievable) was something like: "ezsign-2023" -- where the details of what's happening below the hood shouldn't matter to >80% of all developers. I imagine most typically just plug in values that they find on Stack Overflow (or equivalent site), anyway.

The fact that we're using acronyms at all could be viewed as a failure mode... developers are probably going to have a hard time differentiating "ecdsa" from "eddsa" -- one letter difference... but that's what the cryptographers decided to name them (and cryptographers tend to have a rich history of naming things badly). :P

Cryptosuite names are supposed to simplify the decision on what to use. That is, they're supposed to "package up" combinations of options that developers will have a hard time getting wrong (they're supposed to be "packaged things that you pick that won't easily blow up on you"). For example, allowing a developer to pick SHA-1 at this point in time would be a bad thing, or helping them understand that they should pick SHA-384 when paired with P-384... we want to avoid them having to know what the "right values" are supposed to be (or think they know, when they don't understand the details). An example of what /not/ to do is shown here:

BBS_BLS12381G1_XOF:SHAKE-256_SSWU_ROH2G or BBS_BLS12381G1_XMD:SHA-256_SSWU_ROH2G

Which of the above should a developer pick (given that most of them won't have any idea of what those options mean), and when should they pick one over the other, and can they use SHA-384 instead of SHA-256? When we expose developers to these options, we create the possibility of them not understanding the options and picking the wrong options (especially as an ecosystem grows over time). For any given year, there is a much smaller subset of "things that make sense".

The biggest push back on not including the canonicalization property has been "favoring one cryptosuite over another". For example, "favoring" RDF Dataset Canonicalization over JSON Canonicalization Scheme. We could change the names to "ecdsa-jcs-2019" and "ecdsa-rdc-2019" to "put them on the same playing field", but that feels unnecessary (at least, to me). I find the argument "that three characters creates favoritism" tenuous at best... is the usage of three extra characters "jcs" or "rdc" really going to deter someone from picking one of the options over the other, given that the options are significantly different? So, one option is to come to the conclusion that the "level playing field" argument is not compelling.

Another option is to introduce "aliasing" where "ecdsa-2019" is a "synonym" for "ecdsa-rdc-2019" (but that'll invoke the "not a level playing field" argument). We could try incorporating the canonicalization scheme into something shorter like "ecj-2019" and "ecr-2019", but those feel like developers will have a hard time differentiating one over the other. One thing we could do is say that "aliases" will be based on how many implementations of a particular cryptosuite there are. For example, the two options for ecdsa are: "ecdsa-jcs-2019" and "ecdsa-rdc-2019"... and there will be an alias, "ecdsa-2019" that will default to whatever gets the most implementations during the CR phase?

The other decision to make here is to see if we can get "ecdsa-rdc-2019" to "compile" down to a single byte value in data carriers like CBOR-LD. While encoding a text string that is 14 bytes in size doesn't sound like a lot, it really matters when your byte budget is below 400 bytes... like when presenting VCs as QR Codes or over NFC. So, we might want to consider a type such as "sec:cryptosuiteIdentifier" so that we can create encoders/decoders to compress that 14 byte value down to a single byte or two like we do for @context values in CBOR-LD today. That would make longer cryptosuite names palatable:

https://digitalbazaar.github.io/cbor-ld-spec/#term-codec-registry

Include curve information where there are options. This would give something like: “ecdsa-P256-2019”, “ecdsa-P384-2019”, “jcs-ecdsa-P256-2019”, “jcs-ecdsa-P384-2019” in the ECDSA case and no change in the EdDSA case. But this is somewhat redundant with the public key parameter.

Yes, agreed that stating key type and hash algorithm is redundant, and arguably, adds an attack vector (where you specify one key type, but use another key type to sign). A variation of this led to key confusion attacks (HMAC vs. public key) in JOSE.

Another question that we might consider, to help make a decision, is the ordering in the naming pattern. Perhaps our naming pattern is something like this:

TRANSFORM-HASH-SIGN

That would suggest:

rdc-sha256-ecdsa-2019 (where specifying the hash in almost every cryptosuite not necessary)

So, maybe it reduces to:

TRANSFORM-SIGN

rdc-ecdsa-2019

Or, maybe the most significant choice is the cryptographic signature algorithm, so we should reverse it:

SIGN-TRANSFORM

Which would suggest:

ecdsa-rdc-2019

... which would kinda suggest alignment with ECDSA-SD:

ecdsa-sd-2023 (which only supports one canonicalization scheme for now)

... and if we added another canonicalization scheme to ECDSA-SD in time, we could do:

ecdsa-sd-xyz-2025

“bbs-signature-2023”: A BBS signature. Most likely corresponding to a BBS ciphersuite id of “BBS_BLS12381G1_XOF:SHAKE-256_SSWU_ROH2G” or “BBS_BLS12381G1_XMD:SHA-256_SSWU_ROH2G”. Note that BBS has flexibility in “hash to curve” and creation of group G1 generators. Note that BBS has a mechanism for coming up with ciphersuite ids (BBS). “bbs-proof-2023”: A BBS “proof”, i.e., what we might call a derived signature, since it can be created by a holder based on a “BBS signature and message set”.

We should really try to make sure the BBS spec doesn't make this mistake. Whether something is a base proof or a disclosure proof should be encoded in the signature, not the cryptosuite name. Ideally, the cryptosuite name should just be "bbs-2023", and the first bytes of the proofValue should identify whether the signature is a base proof or a disclosure proof. That's what ecdsa-sd-2023 does and it was the right design choice, IMHO.

All this to say, if we want to specify all the things, these are the sorts of names we're looking at:

ecdsa-rdc-2019 ecdsa-jcs-2019 ecdsa-sd-2023 eddsa-rdc-2022 eddsa-jcs-2022 eddsa-sd-2023 bbs-2023

Thoughts? Interested in @silverpill and @HelgeKrueger thoughts, specifically.

silverpill commented 1 year ago

@Wind4Greg I think "only include when there are options" is a sensible approach. @msporny As for ordering, I would prefer TRANSFORM-HASH-SIGN because this is the order in which operations are performed in practice (and specs are structured accordingly).

Wind4Greg commented 1 year ago

I was somewhat leaning towards the SIGN-TRANSFORM pattern since it lets me know the signing algorithm right away. My EdDSA code is distinct from my ECDSA code. However, I'm flexible.

For the transform names are jcs and rdc okay with folks? I don't think we can do much about the issue that EdDSA and ECDSA only differ by a letter ;-).

msporny commented 1 year ago

@Wind4Greg wrote:

I was somewhat leaning towards the SIGN-TRANSFORM pattern

Yes, same, mostly because developers are probably looking for the signing algorithm first... and then the details associated with that algorithm next. I don't expect many developers to be familiar with the order of operations, like most of us are aware... and frankly, I don't think many of them will care about the canonicalization algorithm in play (even though they probably should ... in the same way that most folks are completely oblivious to the TLS settings used by the web servers they deploy... they just depend on the defaults that ship with the system and then leave those defaults... even when the defaults become compromised... which the software then takes care of for them -- typically without them knowing).

All this to say, +1 for "SIGN-TRANSFORM" vs. "TRANSFORM-SIGN" -- but, won't block either way, we just need to decide on one way and go with it.

HelgeKrueger commented 1 year ago

For the transform names are jcs and rdc okay with folks?

+1

After reading the arguments, I also think that "SIGN-TRANSFORM" is more intuitive for a developer to choose. The choice affecting other parts of the software is what key to use. The transformation / hashing algorithms are technical details.

Wind4Greg commented 1 year ago

Thanks for the comments Helge. I'll try to take some of the text from the above discussion and create a PR for to document this "cryptosuite naming convention" along with its intent.

Cheers Greg

msporny commented 1 year ago

PR #126 has been merged to address this issue, closing.