ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Add an IPLD Schema union representation strategy that can look at multiple keys to decide the content type? #278

Open warpfork opened 4 years ago

warpfork commented 4 years ago

background / current status

In IPLD Schemas, we already have the keyed, kinded, envelope, inline, and even byteprefix representations available for unions.

Each of these representation strategies does something slightly different, but taken together, they represent almost every possibly way someone would want to write a new protocol, and also readily describe the vast majority of protocols in the wild we've seen and tried to retrofit IPLD Schema descriptions onto.

Almost.

Let's consider a scenario we can't currently describe:

Suppose want to declare a union, and its members are both struct types with map representations:

type Foo struct {
    a String
    b String
    c String
}
type Bar struct {
    d String
    e String
}

If I was building a new protocol, I'd say I'd be making the union that has these two types as members be a keyed union. That's always a solid choice.

But... what if I'm stuck describing some existing serialized data... and the union is actually just... one of those two structs, with no surrounding explicit union discriminant info? (Our structs have no fields in common, so we certainly won't be using an inline representation here. There's no surrounding map for keyed mode nor envelope mode -- the data just abruptly starts... we expect to see it begin with something like either {"a": or {"d":!) Right now: we can't describe this.

Could we describe this?

It seems like what we'd want here is something that feels roughly like a hybrid of keyed and inline modes (sorta -- if you squint). Perhaps we could introduce a new union representation mode which might look something like this:

type FooOrBar union {
    | Foo "a"
    | Bar "d"
} representation keysniff

(Surely we would want to give this a better name than "sniff" mode. It's late. I'm uncreative. Forgive me.)

What this would do is: look for either a field called "a" or a field called "d" in the data, and use whichever of those it finds as the discriminant hint.

Would this be unclear? Oh, boy howdy, yes. What happens if a map contains both keys? Do we support it if we encounter the keys in e, d order (the discriminant isn't first), despite the performance costs (buffering, reprocessing, etc) that would imply? Are there other sources of unclarity that would need to be resolved?

Would this be fast to implement? Eh, not really. It'd be subject to about the same issues as inline mode already is. That's not to say we can't support such things (since obviously, we already have inline mode), but... it's still worthy of note, because it means I'd still consistently be recommending people use keyed mode in any new developments.

Will we ever encounter situations when someone wants to regard more than one key at a time? (I hope not. I'd say if that comes up frequently, that would begin to be an argument for not touching this whole idea at all.)

Do we want to describe this?

Well, this I am not sure. Let's discuss.

I think that we can do this is clear enough. The downsides are that it becomes Yet Another thing that schema system implementers will have to support. There are a few clarifications that would need to be made. And it's unclear what the Pareto-prevalence of protocols wanting this are; if they're low-prevalence, will the value of this feature outweigh its costs?

rvagg commented 4 years ago

This would certainly expand the scope of current data layouts that we can describe. My concern with this isn't so much that it has limitations, we have limitations all over the place and it's just a matter of describing them well and ensuring that implementations that choose to support them have the proper boundaries in place. My concern is about this opening up a potential rabbit hole of minor variations of the same theme. Are we going to get demands for additional conditionality to support variants ("sometimes I have these fields, but other times I have these ones, they're optional in these cases but not in other"). Maybe the line is obvious and I need to mull on it a bit more to see that.

For now I'm mostly positive, but have that minor concern.