multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
336 stars 201 forks source link

split tag=ipld into tag=cid and tag=ipldcodec #242

Closed mvdan closed 2 years ago

mvdan commented 2 years ago

Right now tag=ipld is somewhat overloaded. It includes both cidv1 and cidv2, as well as raw and dag-json. Those are clearly separate things. For instance, a CID itself includes a CID version (such as cidv1) as well as a Codec for the target encoded block (such as dag-json).

Since those are clearly separate concepts, I think it would be a good idea to split the tag so that the CID versions have their own class. tag=cid and tag=ipldcodec seem reasonable to me.

warpfork commented 2 years ago

:100:

mvdan commented 2 years ago

@vmx mentions that this shouldn't break Rust code, as they don't publish nor generate a Rust library representing the multicodec table.

Perhaps @rvagg can chime in on whether this tag split and rename could break any JS code.

cc @schomatis @aschmahmann as a follow-up from Slack

mvdan commented 2 years ago

Oh, and I forgot to mention: go-multicodec doesn't expose tag strings/values just yet, so no code should break. I want to expose that soon (https://github.com/multiformats/go-multicodec/issues/58) so I'd rather do the split+rename before the tags surface in the API.

rvagg commented 2 years ago

https://github.com/multiformats/multicodec/pull/243 for the cid change

The ipldcodec change I'm not super keen on. I think I'd prefer to keep ipld and change a few others to it - including json and cbor. There's a blurry line between "serialization" and "ipld" but I propose we unblur that a little by making anything that we have real IPLD codecs for have a tag ipld. There's some that don't fit, like protobuf, maybe rlp too, and the CAR index formats a probably appropriate to leave as serialization too.

vmx commented 2 years ago

Having both ipld and serialization is fine with me. Though what I'd like to get at is, that there is only a specific set of categories that make sense to be used as codec within a CID. Those categories could then be named in the CID spec, I'd hope that this would reduce a bit of confusion.

rvagg commented 2 years ago

Fair enough .. although there's already some boundary-pushing with CIDs, being used for novel things (e.g. CommP/D/R in Filecoin, and the actor codes in the Filecoin chain, there's some novel networking use of CIDs I've noticed in libp2p and related, and identity CIDs in general). So we probably can't be too strict—but we can further clarify the intended purpose of CIDs and encourage users to stick to those boundaries.

mvdan commented 2 years ago

This is now done in https://github.com/multiformats/multicodec/pull/243. I believe we've agreed to not rename tag == "ipld", and I don't feel strongly about that rename either.