multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
334 stars 200 forks source link

Add jwk-jcs #307

Closed mtaimela closed 1 year ago

mtaimela commented 1 year ago

JSON Web Key (JWK) has required members and optional members, the member's order is undefined due to JSON Object representation. The PR proposes a codec for all JWK types, which yields to a deterministic output.

rvagg commented 1 year ago

hmm, I'm not sure this really fits as is since it's not a key itself but rather a way of representing a key, and internally it has its own way of identifying the algorithm used (which match approximately to what we have in here for key), and we already have json, or dag-json if you want to identify the representation form. So what this is really asking for is a special signalling mechanism that differentiates a schema'd version of a JSON document that's not just json, which is not what this table is for. That level of signalling should be done elsewhere in the application stack—a multicodec identifier can say what format the bytes are in, but what's in the decoded thing is up to the application, or some other layer, to take further.

mtaimela commented 1 year ago

Hi @rvagg, would any other type be qualified here? It is structural codec like json-jcs, but specific for JWK. I see there's some use of serialization, but I couldn't find supportive documentation. Would that be fitting for the purpose?

rvagg commented 1 year ago

It comes down to what you're wanting to use a multicodec table entry for; as it stands, we don't add new "serialization" or "ipld" entries where the decoding scheme is the same as an existing one, in this case, it's just JSON. I can see a case for adding a new json-jcs entry here since there are additional rules that make it strictly incompatible with plain json. But a serialization format + schema isn't something the multicodec table is intended for.

mtaimela commented 1 year ago

My intention is to use multicodec table entry for simple representation of public keys (elliptic curves mainly). JWK just happens to be in json representation as it belongs to JOSE (Javascript Object Signing and Encryption), but it is a well formed representation of public or private keys and usable as such. The "required members" plus JCS makes it as deterministic representation.

The JOSE ecosystems high level interfaces are mainly based on JWK, where deeper inner layer implementations handle the cryptographic algorithms and depends on the libraries. If I would be using es256-pub or other defined key types, it means that there must be a conversion into JWK by the client and the client must also understand the deeper layers of how to build that specific key into JWK. This hinders the adoption and applies an unnecessary conversion.

Could it be possible to reconsider JWK as a key representation for the JOSE ecosystem.

rvagg commented 1 year ago

I'd like @vmx' input on this too so I'm going to put it in the IPLD community call agenda to talk through cause I don't quite have the brainspace this week: https://hackmd.io/PjKSfch8QNOY4uNrnrRbDA (call details: https://github.com/ipld/team-mgmt#every-two-weeks-call)

alenhorvat commented 1 year ago

Hi.

Multicodec concept is great. If we take the current definitions (focusing on JSON), we have:

JSON: UTF-8-encoded JSON JSON-JCS: The result of canonicalizing an input according to JCS DAG-JSON: is a codec that implements the IPLD Data Model as JSON

Note that every JSON-JCS is also a JSON. The same holds for DAG-JSON -- is the outcome UTF-8 encoded string?

First, the naming convention is not standardised. We see two patterns: JSON-JCS and DAG-JSON. For the codec itself, it doesn't matter, hence this point is not critical. Second, DAG-JSON defines a codec for a specific data model. Are there additional processing rules?

What we're proposing here, would effectively be equivalent to the DAG-JSON:

JWK-JSC-JSON or JSON-JSC-JWK: is a codec that implements UTF-8 encoded JSON that is a JCS serialised JWK data model (maybe JSON-JWS-JWK notation is better)

JWK is a self-contained private and public key data model.

vmx commented 1 year ago

If we take the current definitions (focusing on JSON), we have:

  • JSON: UTF-8-encoded JSON
  • JSON-JCS: The result of canonicalizing an input according to JCS
  • DAG-JSON: is a codec that implements the IPLD Data Model as JSON

This sounds like you would want to use this as part of a CID, correct? I know the current CID spec isn't really clear about that, but over the years it has been crystalizing out that the "codec" in a CID specifies on how to extract links (CIDs) from the given data blob. One way should be have one codec. For JSON containing link, that's DAG-JSON.

You are right that also "JSON" exists. I consider it being there for historic/legacy reasons. This should be "raw", as you cannot extract links from it. But there surely is the use case of attaching some semantics to the codec, but that should be on a layer above CIDs. Some ideas are currently floating around as "CIDv2". I'll try to find time to write the most promising proposal properly down.

And there's the grey area of "it's JSON with links, but it's not DAG-JSON". I don't have a good answer for that currently, but I hope we'll find one.

alenhorvat commented 1 year ago

Hi @vmx. Thank you for the reply.

We want to use it for Decentralised Identifiers, more specifically did:key (https://w3c-ccg.github.io/did-method-key/#bib-multicodec) -- note that there are some inconsistencies within the current did:key specs -- ignore those are they are not relevant for this discussion.

did:key should contain the public key in a self-contained form, meaning that the information is resolved from the identifier itself. So did:key is a "link" that points only to itself. The design is similar to the peer-id. To avoid custom key serialisation, and since the size overhead is not an issue, we want to use keys in JWK format. However, it is crucial that there's a one-to-one mapping between a key and an identifier, hence we're applying the JCS.

I understand the CIDs and their design, but we see the multicodec as something generic that can be reused for deriving/computing other identifiers (or encoding the information in a self-described way in general).

vmx commented 1 year ago

I understand the CIDs and their design, but we see the multicodec as something generic that can be reused for deriving/computing other identifiers (or encoding the information in a self-described way in general).

Yes, multicodec should be regarded as something generic. I just added the whole blurb about CIDs to safe some back and forth in case you would've planned to use it within a CID :)

I think your use case makes sense.

rvagg commented 1 year ago

Sorry for the hold-up on this, really my fault for not having the mental space to dive into this. The concerns I've been expressing here are really about us being cautious about what gets to be in the multicodec field in a CID, but outside of that concern we ought not be too concerned with what people are doing with these entries, mainly that the categorisation is appropriate, they are documented and named appropriately and that the code is not misplaced (typically that means being careful of what gets to be in the single-byte range).

This does appear to be signalling a codec for decoding bytes, but not as a CID, and the key label in here makes that pretty clear so there shouldn't have been a hold-up. Apologies. (This discussion was had at length in our IPLD meeting just now, notes will go into https://github.com/ipld/team-mgmt and the recording is on YouTube, the IPFS channel, if you're interested in the gory details about why we care about some of these things).

mtaimela commented 1 year ago

Thank you very much for devoting time for this. It is important to protect the multicodec purpose, and I fully understand that this will take time and requires lengthy consideration to fully understand the context.

As a time saving proposal, it might be beneficial to require "template to fill" when submitting PR's. These might contain purpose, example use case, and other things that helps to understand the use case and why the codec would be useful.

rvagg commented 1 year ago

yep, I agree, @vmx and I should probably work on that! We have some fairly standard interaction patterns here these days so it would probably be worth codifying it in an issue template.

also /cc @darobin for interest