vanadium / issues

Vanadium issue tracker
1 stars 1 forks source link

Add VDLUnknown semantics to vdl union types #1207

Open tatatodd opened 8 years ago

tatatodd commented 8 years ago

@razvanm @asadovsky @bprosnitz

Razvan and I had a discussion about backwards and forwards compatibility of the vdl union type. The current semantics are that you cannot add new fields to a union type. E.g. here's an example union in vdl:

type Foo union {
  A int32
  B string
  C float64 // Added later
}

If we take a Foo.C value and try to decode that into an old Foo, that doesn't know about the C field, we get a decoding error. This behavior makes sense, since there isn't anywhere to put the C field. But it makes it onerous to use the union type.

After some discussion, we came up with a pretty good idea. Let's say the user writes this in vdl:

type Foo union {
  VDLUnknown any
  A int32
  B string
}

The "VDLUnknown" field is special; it must be the first field in the union, and it must have the any type, otherwise you get a compile-time error. It also has special semantics; if a union type has the VDLUnknown field, then decoding any unknown field will fill that field. E.g. in the example above, the Foo.C value would be filled into the VDLUnknown field.

This gives users a choice. The current "strict" behavior is the default, where you can't add or remove fields on a union. If the user wants the "looser" behavior, they just add the VDLUnknown field.

I don't think it'll be too hard to implement this feature.

asadovsky commented 8 years ago

Thanks for the nice write-up.

This sounds pretty nice, but I do worry a bit about the "default strict" behavior: I suspect some users will naively expect unions to be evolvable over time, as we did.

Incidentally, this makes me wonder how enums are handled. My guess is that they behave like unions (prior to this proposal). Assuming so, I wonder if people will be surprised that unions would now have a special "VDLUnknown" defaulting mechanism, whereas enums would not.

bprosnitz commented 8 years ago

I think there are a few ways to do this:

or

type Foo union { A int32 B string default any }

or

type Foo union { A int32 B string ... }

I don't personally like the syntax that looks like a normal field because it has special behavior. Between the others it would be a choice of whether all unions are extendable and whether it is the default. If it is the default to have an unknown case, all code that handles those unions would need to support that case. It will likely end up just being a switch case, but it is worth thinking about.

Another consideration is whether the declaration that the union can handle other types should be in the vdl file or elsewhere. The ability to handle more than the declared set of types seems to me to be a implementation-level issue rather than a protocol-level one so it seems a bit weird to me to have it in the VDL file. If this is the case, it seems that either the configuration should be elsewhere or it should not be configurable. That said, if it is configurable in VDL, the 4th case looks best to me.

tatatodd commented 8 years ago

Adam - yeah, as I was driving home I also realized it might be weird that enum doesn't support VDLUnknown. The problem with enum is that there isn't a good place to hold the "unknown" value. So I don't have any good ideas there.

Benj - nice, Razvan and I had discussed exactly those points, and also considered similar other options. :)

IMO the most important thing to decide is whether the "unknown field" concept belongs as either:

  1. An explicit concept in the vdl type system, and thus in the vdl file
  2. A code-generation option, and thus in the vdl.config file

Either way, the generated Go code will need to provide a way to create Foo.VDLUnknown values, just like it provides a way to create Foo.A or Foo.B values. Since that's the case, it seems better to me to choose (1) and make it look much like a regular field. After all the user can use this union field just like they'd use a regular union field. VDLUnknown just happens to have additional semantics during VOM decoding.

Another way to put this: the union type is meant to represent the notion of a disjunction in a type-safe way; Foo is guaranteed to hold either A, B or C. Whenever we add a union field that has type "any", we are relaxing the type-safe disjunction; Foo may hold either A, B or C, or it may hold some unknown other thing. It seems useful to expose this as a first-class property of the type, rather than relegating it to be a code-generation implementation-level option.

bprosnitz commented 8 years ago

The original benefits we discussed for union were:

  1. it is clearer what can be sent and what must be understood by clients/servers talking through a protocol
  2. it (ideally) would make code simpler because the user could know that they would have one of a few types after decode

It seems that this issue that was brought up really effects no. 2 and how different versions are handled. For no 1., it still seems to make sense to have the set of choices for types to send to be explicit, even if different versions of vdl files have different sets of types because with respect to that particular client/server the protocol is clear. In this case, there is a different union version that the particular client/server does not know how to deal with. It seems the desire to introspect/output/store that unexpected value is local to the particular client/server and does not effect the protocol. Because of this, it seems that the concept of handling unknown values doesn't really make sense to be part of the definition of the protocol and in the vdl file.

asadovsky commented 8 years ago

Given that RPC is a primary target of VDL, it seems to me that making the enum and union types "unevolvable" by default would be a mistake - unknown values should not cause decoding to fail.

Btw, it's interesting to see how protobuf handles this: https://developers.google.com/protocol-buffers/docs/proto3#enum https://developers.google.com/protocol-buffers/docs/proto3#oneof

bprosnitz commented 8 years ago

I don't mean that they are unevolvable, I just mean that you only write code to explicitly handle know cases based on the vdl file that you have. If you have other unexpected values, it seems like an issue in figuring out how the client/server should deal with it and does not need to be explicitly part of the protocol specification.

On Thu, Mar 3, 2016 at 12:13 PM Adam Sadovsky notifications@github.com wrote:

Given that RPC is a primary target of VDL, it seems to me that making the enum and union types "unevolvable" by default would be a mistake. For both of these types, I think what I'd want is for there to be an implicit "unknown" zero value that gets automatically generated.

Btw, it's interesting to see how protobuf handles this: https://developers.google.com/protocol-buffers/docs/proto3#enum https://developers.google.com/protocol-buffers/docs/proto3#oneof

— Reply to this email directly or view it on GitHub https://github.com/vanadium/issues/issues/1207#issuecomment-191942749.