ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Confusion about envelope (and inline) representation of unions #343

Closed patrsc closed 3 years ago

patrsc commented 3 years ago

I am confused about validation of envelope unions in IPLD schemas: The authoring guide (schemas/authoring-guide.md) suggests that

{
  "msg": "All good",
  "tag": "progress",
  "payload": {
    "percent": 0.6,
    "last": "61626378797a"
  }
}

would pass validation in the example given there (https://github.com/ipld/specs/blob/master/schemas/authoring-guide.md#envelope), while the specification (https://github.com/ipld/specs/blob/master/schemas/representations.md#union-envelope-representation) looks (to me) like

{
  "msg": "All good",
  "payload": {
    "tag": "progress",
    "payload": {
      "percent": 0.6,
      "last": "61626378797a"
    }
  }
}

would be the correct representation. Which representation is the correct one? If the first one is correct, I think it would be good to give such an example also in the specs in order to avoid confusion. This IPLD schema validator treats the second representation as the correct one and refuses the first one.

Also the inline representation suffers from the same confusion.

rvagg commented 3 years ago

hmm, I think @vmx pointed out some problems with the inline union docs that were certainly wrong but I don't recall whether we fixed that or not, or even noted it in this repo (I can't find it with a quick search and commits don't show anything!).

The envelope union is meant to have the contents within a "content" key in a map, with the "discriminant" key telling you which type of the union it is. So the first example you've shown matches one with a contentKey of tag and discriminantKey of payloa, with the "progress" union having a form that being type Progress struct { percent Float; last String }. The second example would only work if you're focusing on the top level payload as a node to be a union. I don't think I see that example arising out of the docs you've pointed to but it sounds like we have ambiguity about it that we should fix. Perhaps you can help with some wording changes or point out the bits that are problematic?

The inline union is similar but pulls everything up into the current map. So it ditches contentKey because the map itself is the holder of the content. It just has one special key that is used as the discriminantKey to tell you what to interpret the map itself as. So the payload content is just pulled up into the parent map and the tag tells us how to interpret that map.

If it helps, the source of truth I use for all the code around this are mirrored here for JS and Go:

They're also re-used in the tests for js-schema-validator, but it pulls them up into code and out of the yaml https://github.com/rvagg/js-ipld-schema-validator/tree/master/test

But that's not to say that these are strictly correct either! We don't have quite enough of the pieces together to validate that the DSL, as implemented by js-ipld-schema, go-ipld-schema, (and its use within js-schema-validator) connects with what go-ipld-prime expects. Currently go-ipld-prime requires schemas to be written as code, which is a two steps removed from the DSL, and one of those steps isn't quite implemented yet. When those two world are finally connected, we'll find out whether there are mismatches between @warpfork's brain on this and mine! That sounds a bit scary, but I'm strongly confident that it'll only come down to minor details that @warpfork has modified in his implementation of schemas that haven't made it to the test fixtures (there have been a few proposals in this repo).

patrsc commented 3 years ago

Thank you, it is clear to me now (from the specs and test fixtures) what objects the envelope union itself should match. What is still not clear to me (the Authoring Guide confused me) what objects a struct containing an envelope union should match. Let me make my question more clear with the following example:

type Message struct {
  msg String
  info Payload
}

type Payload union {
  | Error "error"
  | Progress "progress"
} representation envelope {
  discriminantKey "tag"
  contentKey "payload"
}

type Error string

type Progress struct {
  percent Float
  last String
}

For what I understand now, the following objects (a) and (b) match the Payload (union) type:

(a)

{"tag": "error", "payload": "ERROR!"}

(b)

{"tag": "progress", "payload": {"percent": 0.6, "last": "a"}}

However, what would match Message (struct containing an enveope union)?

either (c1) and (c2):

(c1)

{"msg": "hi", "info": {"tag": "error", "payload": "ERROR!"}}

(c2)

{"msg": "hi", "info": {"tag": "progress", "payload": {"percent": 0.6, "last": "a"}}}

or: (d1) and (d2):

(d1)

{"msg": "hi", "tag": "error", "payload": "ERROR!"}

(d2)

{"msg": "hi", "tag": "progress", "payload": {"percent": 0.6, "last": "a"}}

Please note that (d1) and (d2) do not use the info field at all. The Authoring Guide treats (d1) and (d2) as correct. If this is true, I do not understand why the whole union envelope map is "pulled up" one level to the Message struct, rendering the field info unnecessary.

rvagg commented 3 years ago

ah, good question. The answer is that c1 & c2 are what it should match.

Message is has a "map" representation (by default), and all envelope unions have "map" representations too. This is the same with keyed unions. A source of truth on this kind of question is https://github.com/ipld/specs/blob/d8ae7e9d78e4efe7e21ec2bae427d79b5af95bcd/schemas/schema-schema.ipldsch#L393 where Eric has put in some of the gory details about kinds for unions.

So when you hit Payload, any schema matching/validating/transformational scheme should anticipate a "map" and anything else would be an error. But the real problem with d1 and d2 is that the Message is broken because it's strictly a "map" with those two fields, and one of those fields is missing. You should be able to turn Message itself into an inline union to achieve a match on those two things, but not an envelope.

So the Authoring Guide is wrong here, I think this is the piece that @vmx pointed out to me was wrong and I didn't get around to fixing it. I'm not even sure the inline union section below it is quite right so I'd better address that. Sorry for the confusion but I hope this is cleared up now?

patrsc commented 3 years ago

Ok, thanks. This makes it clear. So, the authoring guide should be fixed concerning the envelope and inline union types...

warpfork commented 3 years ago

I think these are fixed now by #367 , so I will close this. Thanks for reporting :) and if you see any more weirdness don't hesitate to open another issue.