Closed warpfork closed 2 years ago
Option 3 sounds good to me! It would remove a footgun and make my ADL code a bit nicer.
I don't mind rolling with special casing for Any
but it might be worth dwelling on the source of the problem - which I don't think is in Any
itself, but in union semantics in general. They're always going to have this weirdness about them and Any
is just the most general case. It's the primary part of the model where the beauty of the abstractions break down a little and the light from underneath shows through the cracks. For example, you could say this about any kinded union, really:
So, if Any is implemented as a kinded union, it does exactly that: the representation-level behaviors are "put anything here", as desired; and the type-level semantics... well, it acts like a map: sometimes the map will have an entry with the key "String", sometimes it'll have an entry instead with the key "Map", etcetera.
tbh I think this has a fair bit to do with Go, and also the particular model you're pursuing with ipld-prime, not that I think it's flawed or can easily describe a model that might be more pleasant other than hand-waving about traditional OO interfaces and inheritance models rather than this weird Go thing you're boxed inside of. So that's not a criticism, it's just a constraint.
Users of Any
might have certain expectations they bring to the party, if it's transparently a kinded union and they get this kinded API, it might feel weird to interact with. The only reason we think users aren't going to have quite the same weird feeling about other kinded unions is that they'll be explicitly called out in the schema they're working with, and not an implicit.
How do kinded unions behave in other languages like JS? I assume they don't have this two-layer model like in Go, where we have the not-very-intuitive map at first, and the actual representation underneath?
I'm sure @warpfork has thought about this before, but I'm wondering if there would be nicer ways to offer both the logical level (map) and representation level (any) in Go without the layering.
How do kinded unions behave in other languages like JS
We don't have the nice architecture being built out in ipld-prime, so a kinded union is just a property, of any kind, since, currently, we only operate on plain JavaScript objects. Which is fine as far as it goes but it doesn't get us the nicities of the data model, transparent traversal, drop-in ADLs, etc. I'm still trying to figure out how we get there in JS, but until then we have plain objects.
So ipld-prime is the forefront of how to do this. I'm just imagining a language like Rust coming along afterward, learning all of the lessons of ipld-prime, but using its additional language tools to come up with interfaces that leak much less than kinded unions are currently doing in ipld-prime. But that's really more of a hunch than something I can see. We're all still just relying on @warpfork's brain on the concrete bits here still!
Yeah fwiw I don't think Golang has anything to do with this :) I'd write this exactly the same way in Java for example (it'd just be a lot slower because it would have more pointers in it...)
The logic that I'm running on here is:
Node
interface is used for "raw" data and Data Model stuff (when schemas aren't involved at all).Node
interface is used for the representation-level view of data that's been processed by schemas -- which lets all the codecs, etc, work totally transparently over schema-ified data, which is neat;Node
interface is used for the type-level view of data that's been processed by schemas -- for the same reusability purposes. Which might be worth enumerating because they're a little wilder:
Node
instead of the representation-level Node
, you get a bunch of maps with the struct field names; nice!)
absent
throws a mild wrench in things. Codecs aren't supposed to emit that, generally; but type-level views of structs can yield it. The elegance is marred a bit here. Ah well.)... it just sort of keeps stacking like that :) If the type-level view fits the exact same interface as the representation level view, things just flow.
So when we encounter a story where the type-level view of the data doesn't "do what [somebody] means", it's interesting and maybe we want to try to make it do more of "what [sombody] means". (E.g., if we can make that "debug format for free" thing work "better" in some sense, that's a nice bonus.) But of course there's a limit to this: at most, when talking about "do what [somebody] means" goals, the best we can do is satisficing, because there are so many different "somebody"s with their various concept of "what I mean".
Primarily, my goal is thus that those morphisms be defined at all, so we can expect them to work on any and all data. (E.g. If the "debug format for free" thing doesn't do perfectly what somebody means, well, at least it worked: that provides some value.)
Anyway tl;dr the way union type-level semantics has nothing to do with golang, and everything to do with I need to make them act like a map so that {the union members can be distinguishable from each other} and yet {all unions be uniform in approach when reduced to the lens of the datamodel}.
Just as a short keepalive message: it looks like Option3 -- we define an any
type-kind -- is the right way to move forward with this. It's got some thumbs-up, we haven't come up with any more blockers or contradictions it would produce, and it generally seems like it'll solve the problem.
Next step: I should figure out where this belongs in the documents and write it up.
Happy to review PRs :)
I'm still not comfortable with that option.
When writing a validator, the pattern I used to implement Any
was a nested kinded union: https://github.com/rvagg/js-ipld-schema-validator/blob/3f8e1636b28c393f3637bbef0b3279d396536cd7/ipld-schema-validator.js#L36-L59
Which would be represented as:
type AnyMap {String:Any}
type AnyList [Any]
type Any union {
| Bool bool
| String string
| Bytes bytes
| Int int
| Float float
| Null null
| Link link
| AnyMap map
| AnyList list
} representation kinded
This works super nicely for this purpose. It gets loaded as a set of implicits (also bring in the type names AnyList
and AnyMap
into the global space). And validation proceeds by drilling down into the final terminals without any additional special code to handle it.
The reason I find this illustrative for this discussion is because the concept of Any
necessarily implies a recursive definition. It doesn't mean "anything you want", it means "any valid data model kind or recursive set of data model kinds". Which obviously matters for the validation case, and especially matters in the JavaScript case where you can potentially throw strange things in that shouldn't be there (undefined
, function
perhaps? maybe this could also be the case in Go with a naughty codec that throws out some interesting types).
Making any
a magic thing means this all has to become embedded logic, which is not impossible, just a step backward.
I still really don't understand why this has to be a forcing function here:
The general rule of how unions work is: the logical-level always behaves as a map, where the keys are the names of the types. The representation-level, of course, varies wildly (e.g. kinded vs keyed unions behave nothing alike, representationally).
Maybe your concrete conception of a kinded union is wrong? Why do all unions need to behave like a map? Maybe a kinded union is a different kind of beast and you're forcing it into the wrong model. It doesn't even have logical keys, like a keyed union. Is there a way you can rethink the instantiation of a kinded union at the type level that isn't forcing your hand at this level?
undefined
,function
perhaps?
I think around here is where we have divergence in what we're worried about what this will suggest to users. To me, sitting here with any compiler and host type system at all -- and with a library design that's oriented around an interface for Data Model Nodes -- this is a non-worry. Everything is still going to implement Node
; everything that can be in that value is still going to have to answer Kind()
when I ask it; and the answer to Kind()
is going to have to be a value from the familiar (and strict) enum of Data Model kinds.
I think maybe you make a good point in regards to how we document this, at least -- it makes sense to have a sentence or two clarifying that Any
is not meant to be an escape valve that goes all the way to infinity.
I don't know if I ever linked these explicitly enough, btw, but an example of where this is important was brought up in an issue in go-ipld-prime by @willscott when he was working on the statediff project: https://github.com/ipld/go-ipld-prime/issues/92
Right now you can do things with the basicnode.Any
implementation that you can't do with the Any
definition we've made in schemas using kinded unions, and one way or another, that's definitely a design flaw that needs to get hammered out.
One thing to consider about Any --
If a kinded union can involve user defined types:
type Foo struct {
Bar Int
}
type Baz struct {
Buzz String
}
type KU union {
| Foo
| Baz
} representation Kinded
Then Any as represented by:
type AnyMap {String:Any}
type AnyList [Any]
type Any union {
| Bool bool
| String string
| Bytes bytes
| Int int
| Float float
| Null null
| Link link
| AnyMap map
| AnyList list
} representation kinded
is not a true "Any" as it doesn't contain the user defined types above.
This is more than an academic consideration. The current https://github.com/ipld/go-ipld-adl-hamt has an ANY at the bottom of the tree in the underlying representation, but most likely the types at the bottom are well defined on a per HAMT basis. Or, perhaps they are well defined but unknown until the moment of read -- in cbor-gen world there is a concept of Deferred, where we leave part of a IPLD block in a serialized representation and deserialize later when we know the type.
Anyway, much of this probably does not rise to level of things that should actually make it into the schema layer, but I think having an any that is truly "any" makes more sense. That leads me to believe Option 3 is better.
https://github.com/ipld/ipld/pull/136 , assuming it lands, should finally address this.
We need to define and document behavior of
Any
, a named type that can be used in IPLD Schemas to say, well, "anything" goes here. This is both a documentation and a design issue.As https://github.com/ipld/specs/issues/301 comments,
Any
is something we can implement in a very non-magical way, simply by specifying it in the prelude as a kinded union:Doing this as a bog-standard kinded union has very pleasing parsimony.
However, it has some interesting implications. Namely, since it's just a regular kinded union in the IPLD Schema system, it's got the dichotomy of type-level and representation-level semantics. And generally, what users want and expect of this is the representation-level semantics. But they get... more.
So the question is essentially: is this definition of
Any
actually good?The general rule of how unions work is: the logical-level always behaves as a map, where the keys are the names of the types. The representation-level, of course, varies wildly (e.g. kinded vs keyed unions behave nothing alike, representationally).
So, if
Any
is implemented as a kinded union, it does exactly that: the representation-level behaviors are "put anything here", as desired; and the type-level semantics... well, it acts like a map: sometimes the map will have an entry with the key "String", sometimes it'll have an entry instead with the key "Map", etcetera.There are some POVs from which I don't think it violates the principle of least surprise. But, there are definitely also POVs from which it does. Namely, a typed-
Any
doesn't really work the same as an untyped-basicnode.Any
.So maybe something not great is going on here with this design. For
somegenoutput.Any
to behave differently thanbasicnode.Any
is a significant smell.One option -- Option 0, if you will -- is to roll with this (and that's what the current tip of go-ipld-prime codegen is actually doing. Or, er, well, it's what it recommends doing -- since the prelude is still just a suggestion there, not baked in. Still! Moving on!).
One option for dealing with this is changing the behavior of the type level node for kinded unions. We could declare that for kinded unions, the type-level behavior is actually the same as the representation-level behavior.
I'm a pretty dubious of this: it would break one of the high level design rules of the schema system: that type-level behaviors are defined entirely in terms of the type kind, and have total independence from their representation. (On the other hand, this is admittedly a pretty compelling case.)
Another option is that we make a blessed magic type for
Any
. We give this magic type the behaviors that we want (essentially, it acts likebasicnode.Any
: there's no difference between it's type-level and representational-level behaviors, and the representational behavior is identical to the kinded union specified above.) In this case,type Any [...]
wouldn't end up in the prelude at all: it would be too magical to describe.This is quite a bit less parisimonious than just defining
Any
as a plain, non-magical kinded union. But I really don't have any arguments against it other than that.The third option is to carry that previous idea a bit further: we could introduce a new kind named
any
(e.g. it's a conceptual sibling oflist
|string
|map
|bytes
|etc). We would imbue this kind with the behaviors we want: "put anything here" and "the type level and representation level semantics are the same". The prelude would then also simply containtype Any any
(just like it containstype String string
).This seems pretty good. We're not breaking the high-level rule about type/representation independence with this approach, since we just introduce a whole new kind and it just "happens" to have specified the type behaviors and representation (of which there is only one) behaviors as the same.
Probably the weirdest thing about this idea is that ... hmm. No, nothing's weird about this at all.
any
would be a type-kind, not a representation-kind (e.g., it's in the enum that also containsstruct
|union
|etc), and adding another member to that enum isn't problematic at all. (There was a moment where I thought we'd be stuct introducing a third enum which included this extra weird member, but... no, I think that was sloppy thinking; it's actually simpler than that.)Okay. I think I talked myself into favoring Option 3 here in the course of writing this issue :) Anyone else have any thoughts on this?
Thanks to @mvdan and @willscott for both independently poking me about something feeling a bit weird here, and thus encouraging this further thought.