ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.
MIT License
133 stars 50 forks source link

Union keyed representation don't not match documentation #475

Closed delaneyj closed 1 year ago

delaneyj commented 2 years ago

Per docs

I would expect

{"foo":{"a":1,"b":2,"c":"three"}}

to be valid but only

{"Foo":{"a":1,"b":2,"c":"three"}}

works

package main

import (
    "log"
    "os"

    "github.com/ipld/go-ipld-prime"
    "github.com/ipld/go-ipld-prime/codec/dagjson"
    "github.com/ipld/go-ipld-prime/datamodel"
    "github.com/ipld/go-ipld-prime/fluent/qp"
    "github.com/ipld/go-ipld-prime/node/bindnode"
)

func main() {
    ts, err := ipld.LoadSchemaBytes([]byte(`

type Foo struct {
    a Int
    b Int
    c String
} 

type Bar struct {
    d Bytes
    e Bool
}

type Baz union {
    | Foo "foo"
    | Bar "bar"
} representation keyed
    `))

    if err != nil {
        log.Fatal(err)
    }

    bazType := ts.TypeByName("Baz")
    proto := bindnode.Prototype(nil, bazType)

    workingNode, err := qp.BuildMap(proto, -1, func(ma datamodel.MapAssembler) {
        qp.MapEntry(ma, "Foo", qp.Map(-1, func(mz datamodel.MapAssembler) {
            qp.MapEntry(mz, "a", qp.Int(1))
            qp.MapEntry(mz, "b", qp.Int(2))
            qp.MapEntry(mz, "c", qp.String("three"))
        }))
    })
    if err != nil {
        panic(err)
    }
    dagjson.Encode(workingNode, os.Stdout)

    notWorkingNode, err := qp.BuildMap(proto, -1, func(ma datamodel.MapAssembler) {
        qp.MapEntry(ma, "foo", qp.Map(-1, func(mz datamodel.MapAssembler) {
            qp.MapEntry(mz, "a", qp.Int(1))
            qp.MapEntry(mz, "b", qp.Int(2))
            qp.MapEntry(mz, "c", qp.String("three"))
        }))
    })
    if err != nil {
        panic(err)
    }
    dagjson.Encode(notWorkingNode, os.Stdout)
}

Available in playground.

Am I misunderstand what schema provides?

rvagg commented 2 years ago

Your expectation is correct, so something's fishy here! I'll look into it and let you know. Thanks for reporting!

delaneyj commented 2 years ago

Your expectation is correct, so something's fishy here! I'll look into it and let you know. Thanks for reporting!

Thank for all the effort and fast response time. I had used your work in the node levelup ecosystem and really great to see your involvement in IPLD!

I will also just mention that union envelope representation errors with a Todo error. It's it best to add issues as I see them or is there a slack/discord that's better?

rvagg commented 2 years ago

thanks @delaneyj

As you bump in to TODO errors, particularly for features you're wanting to use, then feel free to open up an issue in this repo describing the missing functionality. We have a project board in this org where we collect these things and try and do some basic prioritisation. Depending on people-availability in the near future we can try and get to it (or you could have a go yourself and we can guide!). Feedback about the utility of the missing feature is especially helpful in priorisation—if we know that the feature would unlock some new potential for an active user then it's much more likely to be shuffled up in priority and not left with the long list of general things to get around to when we have time.

delaneyj commented 1 year ago

@rvagg did you find 🐟?

rvagg commented 1 year ago

no, sorry! been out of action for a little while but thanks for the ping. I've put it on the project board to remind me about it--I need to do a bit of work in bindnode soon so I'll try and look into this at the same time.

delaneyj commented 1 year ago

bump @rvagg

rvagg commented 1 year ago

Gee, I'm really sorry about this, I should have seen it earlier because it's relatively straightforward and a common foot-gun. >6 months to give this answer. But now that I've had time to spend looking at this in detail I can tell you that the behaviour is as expected, you just have to modify your usage a bit to get the outcome you expect.

First, here's a diff of your code to demonstrate how to make this work:

--- a/main.go
+++ b/main.go
@@ -1,6 +1,7 @@
 package main

 import (
+       "fmt"
        "log"
        "os"

@@ -9,6 +10,7 @@ import (
        "github.com/ipld/go-ipld-prime/datamodel"
        "github.com/ipld/go-ipld-prime/fluent/qp"
        "github.com/ipld/go-ipld-prime/node/bindnode"
+       "github.com/ipld/go-ipld-prime/schema"
 )

 func main() {
@@ -48,9 +50,13 @@ type Baz union {
        if err != nil {
                panic(err)
        }
+       fmt.Println("Typed:")
        dagjson.Encode(workingNode, os.Stdout)
+       fmt.Println("\nRepresentation:")
+       dagjson.Encode(workingNode.(schema.TypedNode).Representation(), os.Stdout)
+       fmt.Println()

-       notWorkingNode, err := qp.BuildMap(proto, -1, func(ma datamodel.MapAssembler) {
+       notWorkingNode, err := qp.BuildMap(proto.Representation(), -1, func(ma datamodel.MapAssembler) {
                qp.MapEntry(ma, "foo", qp.Map(-1, func(mz datamodel.MapAssembler) {
                        qp.MapEntry(mz, "a", qp.Int(1))
                        qp.MapEntry(mz, "b", qp.Int(2))
@@ -60,5 +66,9 @@ type Baz union {
        if err != nil {
                panic(err)
        }
+       fmt.Println("Typed:")
        dagjson.Encode(notWorkingNode, os.Stdout)
+       fmt.Println("\nRepresentation:")
+       dagjson.Encode(notWorkingNode.(schema.TypedNode).Representation(), os.Stdout)
+       fmt.Println()
 }

What you're hitting here is the "typed" vs "representation" form of a Node. When you have a schema and work with it through the Node interface, you're mostly dealing with TypedNode - have a look through the schema package because that's where we get to move beyond the standard data model and do things like structs, unions, etc. This is a layer above the basicnode / datamodel Node forms, and then when we get to ADLs we step up another layer again.

Through the typed interface, your Baz has two fields, Foo and Bar. If we were to make a codegen form of this, it'd be essentially type Baz struct { Foo *Foo; Bar *Bar; }. When we interact with it as a Node, we're dealing with that layer of interface. But below the typed layer, we have the datamodel layer where what you actually have is a map with two possible keys, { "foo": {...} } and { "bar": {...} }. The type interface here introduces the concept of a union over the basic map and enforces membership and key naming. It strictly says that the map must have one key and those keys can only be "foo" or "bar". Then in addition, the values of each of those keys must further conform to additional schema rules (both also be maps, with the various named fields you have).

When you do a keyed union in a schema, you're saying something like: when you serialise this data, it should be as a single-key map, where the key is either one of these values and the value in that map should conform to the matching type.

In my diff above you'll see two things - firstly, the big footgun that when serialising data manually, you should use the Representation() form if the node is of type schema.TypedNode, something like this can help in your serialization path:

if tn, ok := node.(schema.TypedNode); ok {
  node = tn
}
encode(node)

(e.g. https://github.com/ipld/go-ipld-prime/blob/master/node/bindnode/registry/registry.go#L130-L132)

That way, you get access to the bare datamodel form of the node you want to serialize.

The second thing is, when you're deserialising (including manually building like you are in the second block with qp), use the representation form of the typed prototype. Your proto is a TypedPrototype, and its Representation() will give you access to the datamodel layer NodePrototype.

If you have a look through this code and search for Representation(), you'll see both of these things happening: https://github.com/ipld/go-ipld-adl-hamt/blob/master/node.go


The main footgun that we've been concerned about with this is what happens when you serialise and deserialise. We've debated putting a if tn, ok := node.(schema.TypedNode); ok { ... in the codecs themselves, so if you try to pass one through a codec you'll always get it encoding the representation form. The flip side of that argument is that it's not illegitimate to want to serialise in a typed form, but that's mainly advanced usage and usually for debugging, so that kind of functionality could be hidden behind a more advanced interface.

Your second qp block however is quite explicit, you're either building a typed form or a representation form and you should choose which prototype you're about to launch in to, as you can see in my diff.

Does that explain it well enough? I hope my delay here didn't hold you up too much!

delaneyj commented 1 year ago

Thank for your effort and explanation!