Closed riptl closed 2 years ago
As far as I know the current general consensus regarding codecs is that if:
(the payload) don't contain content-addressable elements
then the correct way to encode is using the raw
codec, like 0155<multihash>
, indicating "this is an opaque, non-traversible structure"
There have been various past discussions regarding (ab)using the codec as a form of "freeform label", I am not sure where the current thinking is on this: last I know the stance of the stewards is still "strongly discouraged".
@rvagg @vmx can you chime in?
then the correct way to encode is using the
raw
codec, like0155<multihash>
, indicating "this is an opaque, non-traversible structure"
That is correct. The "codec" of an CID is really meant to be an IPLD codec, i.e. it contains the information on how to decode the data, so that links can be extracted. If the data doesn't contain content addressed links, it should use raw
.
There might be cases in the multicodec table that indicate that things are different, but those usually pre-date IPLD as we know it today.
As such identifiers are a common request, there is currently work on a proposal at https://hackmd.io/@vmx/HkoYAr64o#Application-context-proposal (the "application context" one is the most promising), to solve that problem. It would be great to hear if that would solve your problem.
@terorie can you expand a bit on this and what you're trying to achieve?
IPLD ID for Merkle-DAGs that carry Solana transactions as leaves
This:
If the data doesn't contain content addressed links, it should use raw.
is not strictly true, we do have "serialization" type codecs that will never yield native link types, but the "codec" here still tells you how to decode the bytes you've found once you look up the blob corresponding to the hash. cbor
and json
are like this, the're effectively terminal (as is raw
!) but still useful to know how to turn them into data model form.
I'd like to formalise "IPLD" codecs being codecs that can potentially natively yield data model links, and "serialisation" codecs being those that have a self-describing encoding format (e.g. not generic protobuf) that will never yield native data model links.
I'm not sure what this one is but it doesn't seem unreasonable that you could have a CID pointing to it?
Thank you for the thorough reviews. Closing this as this PR is clearly the wrong approach. Indicating the data type in a CIDv2 would be useful indeed.
The main motivation is to be able to iterate IPLD blocks in a CAR, so that each CAR/IPLD block is self-describing.
Solana transactions are a compact binary format (bincode). They don't contain content-addressable elements, but still requesting an IPLD ID for Merkle-DAGs that carry Solana transactions as leaves.
The
0x5B
prefix is arbitrary ("Solana Beach"). Happy to pick any other range.