ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.
MIT License
133 stars 50 forks source link

Autogenratating Node -> Schema type "parsing" would be nice #408

Closed Ericson2314 closed 2 years ago

Ericson2314 commented 2 years ago

For context, I am working on https://github.com/obsidiansystems/go-ipld-swh, which like https://github.com/ipfs/go-ipld-git is a "schema and a codec". I cribbed that second repo as much as possible :).

On the decoding side, https://github.com/obsidiansystems/go-ipld-swh/blob/12c9fb5306f53da12aafe22492acdda662b0f821/snapshot.go#L68-L158 I can go straight from raw bytes to my domain-specific schema side. That's nice! Yes, tagged unions are horrendous within go, but at least the type system is helping me. For making the actual codec.Decoder I can just wrap this function in one that calls AssignNode.

On the encoding side, however, I am stuck manually parsing the untyped/heterogenous Node: https://github.com/obsidiansystems/go-ipld-swh/blob/12c9fb5306f53da12aafe22492acdda662b0f821/snapshot.go#L186-L273 this is completely boilerplate, and the type system doesn't help me do it correctly in the slightest. I would hope the generated code could do this for me, so I could parse an arbitrary Node into a Snapshot (my schema type), and then encode from there back into raw bytes.

I suppose I could just downcast the Node? But this doesn't seem correct given the Node's heterogenous construction. e.g. if one is doing ipfs dag put --input-codec dag-json --store-codec swhid-1-snp It will start as an untyped Node, and so something will need to parse it.

rvagg commented 2 years ago

@Ericson2314 I hope I'm not misinterpreting you here but you should be able to leverage your codegen'd types for this by using AssignNode on your TypedNode builder to do the heavy lifting for you: https://github.com/ipld/go-codec-dagpb/blob/master/marshal.go#L47-L51 - a failure will indicate that the incoming data doesn't match the schema.

It should also be efficient enough that you can run it against multiple types and just use the one that matches--that was the intention behind the schema system, that you could perform these kinds of checks on multiple schemas to find the one that fits.

Also, I'm currently fiddling with a new version of the Bitcoin codec and have been weighing up whether codegen or bindnode is the way to go. Given that we're investing more into bindnode rather than codegen at the moment, and it's a much more natural API (since it's built around Go types), it might be worth considering using it rather than the codegen'd TypeNode. bindnode also has support for some things that codegen doesn't (the reverse is also true but arguably the the things that bindnode has that codegen doesn't are more useful). Have a look @ https://pkg.go.dev/github.com/ipld/go-ipld-prime/node/bindnode, some example usage @ https://github.com/ipfs/go-graphsync/blob/main/message/v2/ipld_roundtrip_test.go

Ericson2314 commented 2 years ago

Oh, it didn't occur to me to use AssignNode in the other direction! Thanks! That would solve the problem.

it might be worth considering using it rather than the codegen'd TypeNode.

I will keep that in mind for the future, but in this specific case due to the overlap SWH and git, I will continue to do what go-ipld-git does. But if that is changed, this can be too!