ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Clarify what a Multicodec is #288

Open vmx opened 3 years ago

vmx commented 3 years ago

A discussion on Rust IPLD about naming things spawned this issue.

I open this issue on the IPLD repository and not on the Multicodec one as I first want to clarify the naming within the IPLD space (which is broeader), before we move on to Multiformats.

Currently Multicodec describes a table with codes. It used to describe "Code + bytes" I made this change to reflect how Multicodec is actually used. Looking at it now, I think a better name would be "Multicode". A "Multicodec" would then again be "Multicode + bytes". And this is also what Multicodecs are in my mental model.

This means that to me a Multihash Code is also a Multicodec, or that Multiaddrs are also Multicodecs.

Coming to IPLD: There are codecs like DagCBOR, I call them "IPLD Codec". I would never use the word "Multicodec" for them.

I thought that our specs do the same, but they don't. Even the CID spec is using the word Multicodec in contrast to Multihash or Multibase. You see a similar picture if you grep this repo for multicodec. You'll find it used in ways where I would use IPLD Codec.

I opened this issue in hope that I find out that if I have a distinct view from others, or if the team thinks similar. The goal for me here is to hear about different views and then discuss how we move forward with naming things. The multiformat stuff was renamed a lot in the past, perhaps it's time to do that again.

mikeal commented 3 years ago

As painful as it would be to rename, i will admit multicode is a better name.

vmx commented 3 years ago

More context in regards how IPLD Specs are using "Multicodec": https://github.com/ipld/specs/blob/d74ce9221a0ff64b79b6fad87aa6196499b45a4e/FOUNDATIONS.md#multicodecs-are-not-meant-to-act-as-types

Multicodecs are used to indicate the format of data in a Block

~This reads strange with my mental model outlined above.~

Update: Actually not. It says that a Multicodec indicates the format and not that a Multicodec is the format.

warpfork commented 3 years ago

I tend to say "Multicodec indicator" as a phrase to make this clearer.

(Or sometimes even "Multicodec indicator byte", to locally make the emphasis that it's quite short -- nevermind that it might be two bytes, etc, due to being a varint; that's often not the salient level of detail in many discussions, vs simply making the point that it's short.)

I dunno about renaming. I tend to think that'll muddy the waters (and not really stick; we'd have to run around pendanticly reminding people not to use the trailing "c" for... far too long).

How about some sort of document that approaches "branding guidelines", and recommends how to best talk about this? That would seem good.