multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
340 stars 205 forks source link

add Cryptid codecs #345

Closed dhuseby closed 1 month ago

dhuseby commented 7 months ago

This adds a number of multicodec values for various Cryptid projects ahead of the public release of the code. Most of the code is already available:

dhuseby commented 7 months ago

The new Multisig and Multikey specs and implementations strictly follow:

This cannot be said for the existing Varsig spec. It relies heavily on context to know how to decode the bytes of the digital signature. More importantly, if your implementation doesn't support a specific algorithm, there is no way to know how many bytes to skip over in the stream to skip over the Varsig. Multisig fixes this as an explicit goal.

rvagg commented 6 months ago

Apologies for our slowness on this @dhuseby, both Volker and I are a bit stretched and you've given us a lot to review with your PRs and the cost of context switching to try and get our heads around all the new stuff in here isn't small.

Firstly, the one we tend to be most concerned about is anything that adds ipld - i.e. anything that might qualify for going into the "codec" portion of a CID.

  1. If it's not intended to live in a CID then ipld is probably the wrong choice.
  2. If it follows the same decoding and encoding process as an existing codec, then adding a new entry is probably not appropriate. It should be a new encoding format that's not already covered. We discourage using codec codes as a signalling mechanism for anything other than "how do I turn these bytes into the IPLD data model and back again".
  3. If decoding bytes using this codec doesn't yield any "links" (concretely: CIDs), such that it can contribute to a larger graph, then serialization may be a better choice.

The new multi* entries will also require some additional thought and consideration. And due to the changes of the tags for existing entries we may need to pull in some folks who rely on those.

dhuseby commented 6 months ago

@rvagg

If decoding bytes using this codec doesn't yield any "links" (concretely: CIDs), such that it can contribute to a larger graph, then serialization may be a better choice.

Ah...ok. These are a new IPLD data structures but I think you're right, serialization is probably correct here.

The new multi* entries will also require some additional thought and consideration. And due to the changes of the tags for existing entries we may need to pull in some folks who rely on those.

This is a fair concern. I think to be safe, I'll just duplicate the overlap and assign new numbers.

Let me fix this.

dhuseby commented 6 months ago

Thank you for taking time to review this. I changed the provenance-log and provenance-log-entry to be serialization. I changed a VLAD from cid to vlad since it is an identifier but not a CID. I restored the varsig entries to their original state and moved all of the multisig signature codecs to unique names and values to avoid collision. This should streamline the review and eliminate any push-back from repurposing.

dhuseby commented 6 months ago

I also just corrected the es384-msig name error. I filed an issue for the same name error for varsig: #348

dhuseby commented 6 months ago

Also just cleaned up the es521-msig name error. I filed an issue for the same name error for varsig: #349

vmx commented 6 months ago

I only had a quick look. We are generally pretty cautious adding anything into the number space which is encoded into a single byte as it is so limited (also some things that are already there shouldn't really, but that's hard to change). So if you would move your additions to a 2 bytes or higher section, that would be great.

dhuseby commented 5 months ago

So if you would move your additions to a 2 bytes or higher section, that would be great.

I can do that, but what about the multikey, multisig, nonce, and the chacha20-poly1305? I put all of these in the single byte space because they are fundamental types (or soon will be). I put chacha20-poly1305 to be next to the existing chacha codecs.

I'm happy to move things elsewhere but I had reasons for putting these in the single byte space.

vmx commented 5 months ago

(or soon will be).

That's the point. At the moment we hardly add anything to the range of single bytes, the idea is to put things there that have wide adoption/are standardized. I know that it's annoying if it really gets used, that later on you don't really change to a different code, but that space is just so limited.

burdiyan commented 5 months ago

FWIW, the word Multikey is currently being used in some of the upcoming W3C specs related to DID and Verifiable Credentials: https://www.w3.org/TR/vc-data-integrity/#multikey.

dhuseby commented 2 months ago

My multikey predates the W3C use. There's no mention of multikey in this file yet. Plus the verifiable credentials spec has failed to gain adoption in key political and business areas.

rvagg commented 2 months ago

I think I'm OK with this, it's just a lot of new tags being added, including single-entry ones: vlag, nonce, multikey. I'd be more comfortable if we didn't encourage more expansion of tags. But I don't have a hard reason to object.

dhuseby commented 2 months ago

including single-entry ones: vlag, nonce, multikey

At the risk of bikeshedding... vlad and cid tags should just be "identifier" or something generic like that. multikey and key could be combine into "key-like" or something generic like that. But that would require re-tagging a bunch of values which I'm sure is way worse of an outcome. The tags are all fundamental types that fit with the establish pattern IMO.

Thanks for the time you spent reviewing these.

rvagg commented 2 months ago

I see no reason to block with further bikeshedding

dhuseby commented 2 months ago

Thanks! Can we land this?

dhuseby commented 2 months ago

I think all concerns have been addressed.

rvagg commented 1 month ago

Sorry for the delays Dave, we're not well set up for dealing with complex requests like this.

vmx commented 1 month ago

@dhuseby Thanks for your patience and all the changes compared to the original PR!

bumblefudge commented 1 month ago

@rvagg @vmx Sorry for the late appearance, this crossed my inbox last week but I was at TPAC on a borrowed laptop and couldn't get here in time.

My multikey predates the W3C use

This kinda feels like a "trust me bro" assertion. The "W3C use" was based on early versions of juan benet's eternally-unfinished/unpublished spec from 2016 which seems consistent with references in the original commits in this org. In fact, I myself removed multikey from the top-level readme in a 2023 PR exactly because I found discrepancies in various proposed multikey variants in the history of this org but no spec more canonical than JB's original issue (which matches earlier versions of W3C usage, fwiw) to discriminate between the variants. The new spec seems perfectly defensible but I would consider giving it a suffix or renaming it to avoid confusion with the older multikey and its variants. If it doesn't get renamed here in this org, it might just cause headaches if this ever goes to IANA.

FWIW W3C DIDWG and VCWG do not use the term "multikey" in surface-level schemas but do under the hood, for RDF-expansion purposes but not as a CID-like multiformat/composite containing keytypes; what they actually use (and what recent drafts specify internally rather than normrefing) is a multibased raw-codec ULEB128 of public key bytes with the w3c-namespaced RDF/JSON-LD semantic publicKeyMultibase, with key type information passed separately rather than inline. The Data Integrity vocabulary, on the other hand, groups publicKeyMultibase and other terms under the umbrella property of multikey, tho, so maybe there is some kind of confusion potential here, I'm not really an ontology expert. AFAIK none of the representatives from Maersk, GS1, or the Singaporean Trade Commission were lying about using publicKeyMultibase in prod last week at W3C TPAC? Also the DNSP social media ecosystem uses the older, simpler "multikey" family of terms for its signature verification stuff, and technically UCAN does as well, since it uses did:key throughout...

Similarly, I believe there is some multisig work ongoing elsewhere in the PL cinematic universe that I referred OP to when they asked about this a year ago, and never heard back about. That one goes by varsig rather than multisig and I'm not sure it's going to production or not, but could also be a source of avoidable confusion. Probably nothing but odd it wasn't mentioned in the PR.

No particular action needed at this time but I figured this thread should have more information in it for future reference.

dhuseby commented 1 month ago

@bumblefudge I don't care what it is called. If "multikey" is a hotly contested name then let's go with "polykey" or "unikey" or "whateverkey" or "skeletonkey". I officially do not care. I just need these changes to get landed. So in my best effort to end the yak shaving and to celebrate Halloween/Diwali and our triumph of light over darkness/enlightenment over ignorance, I propose the use of "skeletonkey" to alleviate the concern.

dhuseby commented 1 month ago

If Rust allowed for unicode type names, I'd even go so far as to call it "💀 key"

dhuseby commented 1 month ago

@dhuseby Thanks for your patience and all the changes compared to the original PR!

No worries! This is Thunderdome open source. 🪓

bumblefudge commented 1 month ago

oh totally the sigil is yours! sorry it didn't land sooner.

dhuseby commented 4 weeks ago

Thank you!