microsoft / typespec

https://typespec.io/
MIT License
4.44k stars 208 forks source link

Add direct support for tagged unions #1283

Open AttilaMihaly opened 1 year ago

AttilaMihaly commented 1 year ago

We are in the process of defining a mapping between Cadl and Morphir. Morphir supports tagged unions and based on the docs we first mapped that to Named Unions in Cadl. When looking at the "Discriminated unions" example on the Playground though we found that we need to do a lot of extra declarations to get the OAS mapping behave as expected. Would it be possible to provide a default mapping for a named union that has the expected "one of" semantics using the names as tags in the resulting JSON without having any extra declarations on top of the named union?

Details

Morphir's main modeling language is Elm so I'm going to use that syntax in my explanation. Let's take a slightly simplified version of the playground example:

union Widget {
  heavy: string,
  light: string
}

Our assumption based on the docs was that this would correspond to the following Elm type declaration:

type Widget
    = Heavy String
    | Light String

What this means in Elm is that a Widget is either Heavy or Light and in both cases it has some string data. In Elm instances of this type explicitly specify which variant they are: Heavy "foo" or Light "foo". When the above Cadl declaration is mapped to OAS though we get:

"Widget": {
      "anyOf": [
          {
            "type": "string"
          },
          {
            "type": "string"
          }
      ]
}

Which means that "foo" will be a valid JSON value for it and we cannot tell if it's heavy or light.

It would be great if the default encoding for named unions would include the name somehow. Given that there is no standard way to do this in JSON I think it's OK if that mapping is something specific for Cadl. Morphir maps this to ["Heavy", "foo"] and ["Light", "foo"] in JSON but I think the more widely used approach is to do something like { kind: "Heavy", value: "foo" }.

What do you think?

bterlson commented 1 year ago

So far we've been trying to be unopinionated about how such unions are emitted into JSON because, as you observe, there are a number of prevailing practices. I think the most common one (which is supported by default in e.g. Rust's serde library) is what we emit in this case, though this issue proposes we make this "automatic" when you use the discriminator decorator. This does not work of course when your variants aren't models, such as in your example.

The general solution we're tacking toward is to let you use projections to define your own serialization to JSON for such types, though unfortunately this doesn't work at the moment. With that issue addressed, you should be able to do something like:

projection union#target {
  to(targetName) {
    if (targetName == "json") {
      self::variants.forEach((v) => {
        if (v::name) {
          // only run projection for union declarations, not
          // anonymous unions like "A | B"
          v::setType({ kind: v::name, value: v::type });
        }
      });
    };
  }
}

This basically says, when converting a union to JSON, first alter its variants from x: T to x: { kind: "x", type: T}, which will then do the right thing in the OA3 emitter.

I do see that our current emit generates nonsense in the case you point out, so I wonder if that justifies taking a more "conservative" approach to emitting union declarations without the @discriminator decorator to OpenAPI using a pattern like what you suggest. @mikekistler do you have any thoughts?

markcowl commented 1 year ago

@bterlson Please discuss requirements in your next meeting and update issue

mikekistler commented 1 year ago

I chatted on the side with @bterlson and it seems reasonable to generate named unions with an envelope (with some mechanism to configure property names, etc) provided that we also support unnamed unions for users that don't need/want names on the union variants.

When users have a choice, then the presence of names can signify that these are meaningful and the emitter should endeavor to represent them in the on-the-wire representation.

darrelmiller commented 1 year ago

I agree that using the { kind: "x", type: T} syntax in the JSON payload is the most natural approach. Regarding unnamed unions, are those only possible where JSON is capable of distinguishing between the literals? e.g. string | int or string | bool, etc

mikekistler commented 1 year ago

I think the types in unnamed variants should be "distinct" (not equal), but not necessarily "where JSON is capable of distinguishing". The generated schema is an anyOf, not oneOf, so it is acceptable for a value to be an instance of more than one variant type. Presumably the application would have logic for deciding which variant is applicable -- it does not have to be discernable just from JSON.

BosqueLanguage commented 1 year ago

Adding a +1.

Having explicit support for ADT/tagged unions will be useful for integrating Bosque and cadl as well. As it seems like languages with algebraic datatypes growing in popularity (e.g. Rust) this should also be useful in building natural feeling bindings to them as well.