Open warpfork opened 4 years ago
This would certainly expand the scope of current data layouts that we can describe. My concern with this isn't so much that it has limitations, we have limitations all over the place and it's just a matter of describing them well and ensuring that implementations that choose to support them have the proper boundaries in place. My concern is about this opening up a potential rabbit hole of minor variations of the same theme. Are we going to get demands for additional conditionality to support variants ("sometimes I have these fields, but other times I have these ones, they're optional in these cases but not in other"). Maybe the line is obvious and I need to mull on it a bit more to see that.
For now I'm mostly positive, but have that minor concern.
background / current status
In IPLD Schemas, we already have the
keyed
,kinded
,envelope
,inline
, and evenbyteprefix
representations available for unions.Each of these representation strategies does something slightly different, but taken together, they represent almost every possibly way someone would want to write a new protocol, and also readily describe the vast majority of protocols in the wild we've seen and tried to retrofit IPLD Schema descriptions onto.
Almost.
Let's consider a scenario we can't currently describe:
Suppose want to declare a union, and its members are both struct types with map representations:
If I was building a new protocol, I'd say I'd be making the union that has these two types as members be a
keyed
union. That's always a solid choice.But... what if I'm stuck describing some existing serialized data... and the union is actually just... one of those two structs, with no surrounding explicit union discriminant info? (Our structs have no fields in common, so we certainly won't be using an
inline
representation here. There's no surrounding map forkeyed
mode norenvelope
mode -- the data just abruptly starts... we expect to see it begin with something like either{"a":
or{"d":
!) Right now: we can't describe this.Could we describe this?
It seems like what we'd want here is something that feels roughly like a hybrid of keyed and inline modes (sorta -- if you squint). Perhaps we could introduce a new union representation mode which might look something like this:
(Surely we would want to give this a better name than "sniff" mode. It's late. I'm uncreative. Forgive me.)
What this would do is: look for either a field called "a" or a field called "d" in the data, and use whichever of those it finds as the discriminant hint.
Would this be unclear? Oh, boy howdy, yes. What happens if a map contains both keys? Do we support it if we encounter the keys in
e, d
order (the discriminant isn't first), despite the performance costs (buffering, reprocessing, etc) that would imply? Are there other sources of unclarity that would need to be resolved?Would this be fast to implement? Eh, not really. It'd be subject to about the same issues as
inline
mode already is. That's not to say we can't support such things (since obviously, we already haveinline
mode), but... it's still worthy of note, because it means I'd still consistently be recommending people usekeyed
mode in any new developments.Will we ever encounter situations when someone wants to regard more than one key at a time? (I hope not. I'd say if that comes up frequently, that would begin to be an argument for not touching this whole idea at all.)
Do we want to describe this?
Well, this I am not sure. Let's discuss.
I think that we can do this is clear enough. The downsides are that it becomes Yet Another thing that schema system implementers will have to support. There are a few clarifications that would need to be made. And it's unclear what the Pareto-prevalence of protocols wanting this are; if they're low-prevalence, will the value of this feature outweigh its costs?