ipld / libipld

Rust IPLD library
Apache License 2.0
135 stars 40 forks source link

Question: Why are the macros for deriving the Codec instead of the Data Model? #160

Open ProofOfKeags opened 1 year ago

ProofOfKeags commented 1 year ago

As far as I understand, the purpose of IPLD is to largely be codec agnostic, and correspondingly it seems like the derivations being tied to the Codec defeats this property. It is perhaps the case that I am misunderstanding how things are supposed to work but a cursory glance at the examples directory of dag-cbor-derive suggests that it is a proc macro language that is trying to match the schema DSL of IPLD. If that is the case, what is codec specific about these macros?

vmx commented 1 year ago

As far as I understand, the purpose of IPLD is to largely be codec agnostic,

That's correct. The reason why the dag-cbor-derive only exists for CBOR is that one had to start implementing something schema like somewhere.

As https://github.com/ipld/serde_ipld_dagcbor exists now, I'd expect that most of those transformations would live on the Serde level, and hence make it codec agnostic (if more codecs get implemented based on Serde).

ProofOfKeags commented 1 year ago

I guess where I'm tripping up is where is the transformation from the concrete data type to the IPLD DM taking place? It doesn't seem to occur anywhere. The above referenced serde_ipld_dagcbor doesn't seem to answer this because it is simply taking the serde data model and serializing it via a dagcbor codec. This seems like the opposite of what you'd want. I'd expect that you'd want to derive some IpldDm trait (or handroll it), and then anything that had that implemented could be serialized via any of the codecs. Ideally this wouldn't touch serde at all since to use serde would be to cede control of the data model to serde's. What am I missing?

EDIT: Maybe it seems like the right way to use this stuff would be to implement Into<Ipld> and TryFrom<Ipld> for the data types I want here and ignore the actual DagCbor derivation macros?

vmx commented 1 year ago

I guess where I'm tripping up is where is the transformation from the concrete data type to the IPLD DM taking place? It doesn't seem to occur anywhere.

It goes directly from concrete data type to the serialized version. There is no intermediate IPLD Data Model step. You'd then implement similar derives for other codecs.

The above referenced serde_ipld_dagcbor doesn't seem to answer this because it is simply taking the serde data model and serializing it via a dagcbor codec. This seems like the opposite of what you'd want. I'd expect that you'd want to derive some IpldDm trait (or handroll it), and then anything that had that implemented could be serialized via any of the codecs. Ideally this wouldn't touch serde at all since to use serde would be to cede control of the data model to serde's. What am I missing?

I'd hope that something like IpldDm would exist, but it would use Serde under the cover, so that it works with every IPLD codec that is implemented the same way as serde_ipld_dagcbor on Serde. Please note that so far it's only hopes and thoughts, I haven't tried it yet.

EDIT: Maybe it seems like the right way to use this stuff would be to implement Into<Ipld> and TryFrom<Ipld> for the data types I want here and ignore the actual DagCbor derivation macros?

You can do that as well.

ProofOfKeags commented 1 year ago

I just find it rather confusing that the documentation talks a lot about the IPLD data model but that the implementation just uses serde. If we're just gonna use serde, what's the point of IPLD in the first place?

vmx commented 1 year ago

The IPLD and the Serde data model are very similar. At one point I thought they are incompatible enough, so that it doesn't make sense to use Serde. I was proven wrong. The "CID hack" in order to make Serde understand links isn't really that nice, but it seems to work. This way you can represent the missing piece from the IPLD Data Model, in the Serde data model.

Serde is indeed used, but for the IPLD use case, you can see it as a helper library to do (de)serialization, which is only part of Serde's power in the Rust ecosystem. If Serde wouldn't be used, but we would implement the IPLD Data Model directly, it would largely be a re-implementation of Serde itself. So instead, we save that development time and use Serde for the pieces where it makes sense. We still need custom codec implementations (like serde_ipld_dag_cbor, so it's not a "we use Serde and suddenly we can generate IPLD compatible data with all existing Serde codecs".

What we gain from this is, that you can generate IPLD compatible data directly from your native Rust type system. Serde is widely used in the Rust world. So as soon as some type implements the Serde traits, you can easily serialize it into IPLD. If we wouldn't use Serde, one would need a custom implementation for IPLD.

nathan-at-least commented 1 year ago

This ticket is really useful for me.

My journey here: prototyping a decentralized revision control thingy with a custom data model, rediscovering IPLD and reading over a bunch of docs. It seems like the right fit for my need. It's rust so I showed up here from IPLD lib docs.

However, the API docs are absent, so I've been reading over tickets and PRs to get a feel for the API here. Next I will check out examples.