multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
334 stars 200 forks source link

add Cryptid codecs #345

Open dhuseby opened 3 months ago

dhuseby commented 3 months ago

This adds a number of multicodec values for various Cryptid projects ahead of the public release of the code. Most of the code is already available:

dhuseby commented 3 months ago

The new Multisig and Multikey specs and implementations strictly follow:

This cannot be said for the existing Varsig spec. It relies heavily on context to know how to decode the bytes of the digital signature. More importantly, if your implementation doesn't support a specific algorithm, there is no way to know how many bytes to skip over in the stream to skip over the Varsig. Multisig fixes this as an explicit goal.

rvagg commented 2 months ago

Apologies for our slowness on this @dhuseby, both Volker and I are a bit stretched and you've given us a lot to review with your PRs and the cost of context switching to try and get our heads around all the new stuff in here isn't small.

Firstly, the one we tend to be most concerned about is anything that adds ipld - i.e. anything that might qualify for going into the "codec" portion of a CID.

  1. If it's not intended to live in a CID then ipld is probably the wrong choice.
  2. If it follows the same decoding and encoding process as an existing codec, then adding a new entry is probably not appropriate. It should be a new encoding format that's not already covered. We discourage using codec codes as a signalling mechanism for anything other than "how do I turn these bytes into the IPLD data model and back again".
  3. If decoding bytes using this codec doesn't yield any "links" (concretely: CIDs), such that it can contribute to a larger graph, then serialization may be a better choice.

The new multi* entries will also require some additional thought and consideration. And due to the changes of the tags for existing entries we may need to pull in some folks who rely on those.

dhuseby commented 2 months ago

@rvagg

If decoding bytes using this codec doesn't yield any "links" (concretely: CIDs), such that it can contribute to a larger graph, then serialization may be a better choice.

Ah...ok. These are a new IPLD data structures but I think you're right, serialization is probably correct here.

The new multi* entries will also require some additional thought and consideration. And due to the changes of the tags for existing entries we may need to pull in some folks who rely on those.

This is a fair concern. I think to be safe, I'll just duplicate the overlap and assign new numbers.

Let me fix this.

dhuseby commented 2 months ago

Thank you for taking time to review this. I changed the provenance-log and provenance-log-entry to be serialization. I changed a VLAD from cid to vlad since it is an identifier but not a CID. I restored the varsig entries to their original state and moved all of the multisig signature codecs to unique names and values to avoid collision. This should streamline the review and eliminate any push-back from repurposing.

dhuseby commented 2 months ago

I also just corrected the es384-msig name error. I filed an issue for the same name error for varsig: #348

dhuseby commented 2 months ago

Also just cleaned up the es521-msig name error. I filed an issue for the same name error for varsig: #349

vmx commented 2 months ago

I only had a quick look. We are generally pretty cautious adding anything into the number space which is encoded into a single byte as it is so limited (also some things that are already there shouldn't really, but that's hard to change). So if you would move your additions to a 2 bytes or higher section, that would be great.

dhuseby commented 1 month ago

So if you would move your additions to a 2 bytes or higher section, that would be great.

I can do that, but what about the multikey, multisig, nonce, and the chacha20-poly1305? I put all of these in the single byte space because they are fundamental types (or soon will be). I put chacha20-poly1305 to be next to the existing chacha codecs.

I'm happy to move things elsewhere but I had reasons for putting these in the single byte space.

vmx commented 1 month ago

(or soon will be).

That's the point. At the moment we hardly add anything to the range of single bytes, the idea is to put things there that have wide adoption/are standardized. I know that it's annoying if it really gets used, that later on you don't really change to a different code, but that space is just so limited.

burdiyan commented 1 month ago

FWIW, the word Multikey is currently being used in some of the upcoming W3C specs related to DID and Verifiable Credentials: https://www.w3.org/TR/vc-data-integrity/#multikey.