multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
341 stars 204 forks source link

Compact RSA public key representation #235

Closed clehner closed 2 years ago

clehner commented 3 years ago

RSA public keys can be represented using ASN.1/DER RSAPublicKey in #233. However, there is a concern that potential implementers may consider this a legacy format: https://github.com/w3c-ccg/did-method-key/pull/41#issuecomment-931622784. Addressing this could be an opportunity to find a more compact encoding, as there is some overhead with the RSAPublicKey DER encoding, as it encodes a data structure format that is not really needed here, as the data structure may be understood from the context (multiformats/multicodec and the particular multicodec entry): we just need to represent two positive integers.

In Example 1 in https://github.com/multiformats/multicodec/pull/233#issue-1010247881, a RSA public key is represented in 270 bytes, that is 256 bytes for the 2048-bit modulus, 3 bytes for the (public) exponent, 1 byte for the (sometimes needed) leading zero in the modulus, and 10 remaining bytes for the data structure. @expede and @b5, how would you feel about approaching a more compact representation?

I think this may be more work for some implementers, do to wide support of RSAPublicKey in ASN.1/DER; but I think @dlongley is right that new implementations might not in fact have such support.

Is anyone aware of an appropriate encoding here? As we are in the context multiformats/multicodec, I would think to reuse varint. However some implementations of multicodec (and did:key) might not actually have varint implemented, as they may rely on the static encoded types. The readme lists only two implementations: https://github.com/multiformats/unsigned-varint#implementations. Perhaps a simpler format may be preferable? On the other hand, as an implementer I believe I could work with that, using the Rust implementation. If a simpler format is preferred, unless there is some existing standard compact and canonical representation for this key type, I would propose the following: one byte for the length of the exponent, followed by the exponent, followed by the modulus. As integers, the exponent and modulus would be encoded the same as in ASN.1 (after dropping the leading zero where appropriate) and as in JWK (before Base64Url encoding). I think it is the same in OpenSSH public key format as well.

expede commented 3 years ago

I'm all for exploring more compact representations. The smaller we can get these, the better. Agreed that 10 bytes of structure seems unnecessary. I'm happy with an unsigned varint, though it's certainly harder to debug looking directly at the raw output. Given that the key is going to be one of a few fixed lengths, do we want a varint specifically? Is the MSB overhead worthwhile in this context?

Further tricks like omitting the exponent (and assuming it's 65537) feel too far 😱

rvagg commented 3 years ago

Also keep in mind that the more novel and custom things are in crypto the more difficulty you'll have with standards bodies and implementers actually adopting your thing. Making something new to shave off a few bytes might get points docked when trying to get broad adoption.

b5 commented 3 years ago

I'm biased toward the view @rvagg is cautioning us on.

While I don't sleep as well knowing that there's a data structure that isn't actually needed, I do find this kind of thing relying a little too heavily on multicodec. I interpret the design intent behind multicodec to be a multiplexing prefix on existing serialization formats, not a source of serialization logic itself. If we can't point to an existing RSA key representation that matches this compactness, then I think we've found the right balance of adoption potential and wire efficiency.

With that said, @clehner, you mentioned that OpenSSH uses a similar approach, so I personally could be won over if there were folks outside of OpenSSH using OpenSSH-style public key representations. In that case we're just adding momentum to an existing adoption effort.

expede commented 3 years ago

I interpret the design intent behind multicodec to be a multiplexing prefix on existing serialization formats, not a source of serialization logic itself.

Oh yeah, that's a good way to view it 👍

standards bodies and implementers actually adopting your thing

On the flip side, over in the W3C did:key repo folks are raising concerns about the existing common RSA formats: https://github.com/w3c-ccg/did-method-key/pull/41#issuecomment-931622784

I can see a lot of did:key resolver libraries not wanting to take on the legacy code required to parse DER / ASN.1 to validate RSA keys ... so enabling that to be avoided should be a consideration as well.

At the end of the day, an RSA public key is 3 numbers. We can tightly pack them, or use a different serialization format. RSA keys are pretty big at baseline, but agreed that usability is a factor.

FWIW I just want interop and not a huge amount of overhead (some is fine). I'm happy with any well-specified encoding, as long as we don't end up with massive headers that come with some formats.

clehner commented 3 years ago

Hi, sorry to leave this hanging.

I put together a comparison of some possible RSA public key representations, here: https://celehner.com/2021/11/rsapk/

Here is a copy of the table from that page:

Representation Bytes for Reference Key
RSAPublicKey (ASN.1) 270
COSE Key 268
ssh-rsa (SSH Public Key) 279
ssh-rsa without prefix string 268
ssh-rsa without prefix string or modulus length 264
4-byte exponent length 263
1-byte exponent length 260

The "reference key" is the 2048-bit key "Example 1" from https://github.com/multiformats/multicodec/pull/233#issue-1010247881. RSAPublicKey (ASN.1) is the representation we have in multicodec currently. The representation called "1-byte exponent length" is the one mentioned in https://github.com/multiformats/multicodec/issues/235#issue-1012628576 as "one byte for the length of the exponent, followed by the exponent, followed by the modulus".

If we want to stick with a standard format, COSE Key (in Canonical CBOR) is slightly more compact than RSAPublicKey/ASN.1. If we want to compact things further, the "1-byte exponent length" representation is the best I can see right now.

I see it would be unusual to apparently define a format in multicodec, rather than reference an existing format. If we were to use a new format, would it be better to define it somewhere else - maybe even in the did:key specification itself? That could be proposed alongside the test vectors, or in an appendix, etc., and referred to like "RSA public key, according to the did:key method v0.7".

Or if RSAPublicKey/ASN.1 is good enough, I could mark https://github.com/w3c-ccg/did-method-key/pull/41 as ready for review as it is now.

OR13 commented 3 years ago

Don't invent a new key format for RSA to save a few bytes, it will cause lots of translation bugs, and prevent standards adoption.

Pick a standard key format, JWK, CWK, ASN.1, and map that to the registered prefix.

Pop the prefix off and use off the shelf libraries.

The space saving is not worth the translation bugs, especially for RSA.

did:key is not the place to define new key formats, it's the place to use existing / standard ones with DIDs.

The less innovation we do on key formats, the better IMO (both in this repo and in did:key).

I'm in favor of keeping things as is (ASN.1) or using (CWK).

expede commented 3 years ago

Or if RSAPublicKey/ASN.1 is good enough, I could mark w3c-ccg/did-method-key#41 as ready for review as it is now.

I'm in favor of keeping things as is (ASN.1) or using (CWK).

I just want interop

@clehner I think the general consensus is to keep with ASN.1, and we can add other versions (with their own prefixes) if and when there's a strong use case for something different

clehner commented 3 years ago

Thanks @OR13 and @expede for the feedback.

I think CWK might make more sense as its own multicodec type, as it is not RSA-specific but rather is a more general key representation (like JWK or SPKI) - and maybe for its own DID method rather than in did:key which uses multicodec for key type.

I am seeing the consensus to keep the current entry as is (ASN.1/DER RSAPublicKey). I think this issue can be closed unless there is reason to discuss it further.