microsoft / kiota

OpenAPI based HTTP Client code generator
https://aka.ms/kiota/docs
MIT License
2.92k stars 203 forks source link

C# Model - oneOf of list and object is not deserialized correctly #3976

Open Irame opened 9 months ago

Irame commented 9 months ago

I want to consume an API that has the following construct for properties (OneOrManyItems in this example) at quite a lot of places:

{
  "openapi": "3.0.3",
  "info": {
    "title": "Test",
    "version": ""
  },
  "paths": {
    "/test": {
      "get": {
        "responses": {
          "200": {
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/Result"
                }
              }
            }
          },
          "401": {
            "description": "Unauthorized Request"
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "Result": {
        "properties": {
          "OneOrManyItems": {
            "oneOf": [
              {
                "type": "array",
                "items": {
                  "$ref": "#/components/schemas/Item"
                }
              },
              {
                "$ref": "#/components/schemas/Item"
              }
            ]
          }
        }
      },
      "Item": {
        "properties": {
          "Id": {
            "type": "string"
          }
        }
      }
    }
  }
}

Its basically a property (OneOrManyItems) that can be a list of objects or just one object (both with the same type Item).

When I now generate a client using this command: kiota generate -d desciption.json -l CSharp. I get a model for the result that does not parse the json response correctly. It only parses it correctly for the case that it is a list of items not a single item.

The Result_OneOrManyItems class inside the Result looks something like this:

public class Result_OneOrManyItems : IComposedTypeWrapper, IParsable {
    public List<ApiSdk.Models.Item>? Item { get; set; }
    public ApiSdk.Models.Item? ResultOneOrManyItemsItem { get; set; }

    public static Result_OneOrManyItems CreateFromDiscriminatorValue(IParseNode parseNode) {
        _ = parseNode ?? throw new ArgumentNullException(nameof(parseNode));
        var mappingValue = parseNode.GetChildNode("")?.GetStringValue();
        var result = new Result_OneOrManyItems();
        if("Item".Equals(mappingValue, StringComparison.OrdinalIgnoreCase)) {
            result.ResultOneOrManyItemsItem = new ApiSdk.Models.Item();
        }
        else if(parseNode.GetCollectionOfObjectValues<ApiSdk.Models.Item>(ApiSdk.Models.Item.CreateFromDiscriminatorValue)?.ToList() is List<ApiSdk.Models.Item> itemValue) {
            result.Item = itemValue;
        }
        return result;
    }

    public virtual IDictionary<string, Action<IParseNode>> GetFieldDeserializers() {
        if(ResultOneOrManyItemsItem != null) {
            return ResultOneOrManyItemsItem.GetFieldDeserializers();
        }
        return new Dictionary<string, Action<IParseNode>>();
    }

    public virtual void Serialize(ISerializationWriter writer) {
        _ = writer ?? throw new ArgumentNullException(nameof(writer));
        if(ResultOneOrManyItemsItem != null) {
            writer.WriteObjectValue<ApiSdk.Models.Item>(null, ResultOneOrManyItemsItem);
        }
        else if(Item != null) {
            writer.WriteCollectionOfObjectValues<ApiSdk.Models.Item>(null, Item);
        }
    }
}

When deserialized the ResultOneOrManyItemsItem property is always null and the Item is always non null but only contains something, if the parsed json is an array. The problem seems to me that the CreateFromDiscriminatorValue function only looks at the discriminator value even if there is none and not at the type of the json that is being parsed.

The solution I found for my usecase looks like this:

public static Result_OneOrManyItems CreateFromDiscriminatorValue(IParseNode parseNode) {
    _ = parseNode ?? throw new ArgumentNullException(nameof(parseNode));
    var result = new Result_OneOrManyItems();
    var list = parseNode.GetCollectionOfObjectValues<ApiSdk.Models.Item>(ApiSdk.Models.Item.CreateFromDiscriminatorValue)?.ToList() as List<ApiSdk.Models.Item>;
    if((list.Count ?? 0) == 0) {
        result.ResultOneOrManyItemsItem = new ApiSdk.Models.Item();
    }
    else {
        result.Item = list;
    }
    return result;
}

This is far from ideal because it does not allow empty lists and does not use the real underling type. I also have to touch a lot of generated code for this solution which is not convenient and error prone.

This Issue is probably similar to #2338 but it wasn't 100% my use case, hence I created a new issue. I also saw a similar API description in this issue #3963 but the problem was a different one.

If this is not fixable in the short term is there a better workaround than adjusting all implementations of CreateFromDiscriminatorValue by hand?

baywet commented 9 months ago

Hi @irame Thanks for using kiota and for reaching out. The support for union (inclusive/exclusive) of object types during deserialization requires a discriminator mapping This is because doing some kind of advanced pattern matching of the payload at runtime would be computationally expensive.

Does your Item object return a property that maps to sub-types somehow (e.g. if item is vehicle, that property would return car, truck, train...) ?

Irame commented 9 months ago

Hi thanks for the quick response.

Unfortunately, there isn't a specific property with a fixed value that indicates whether it's a single object or something else. The specific API that has this problem is the UPS API, and you can find their OpenAPI files here: https://github.com/UPS-API/api-documentation. One example would be within the Shipping.json file, the schema for ShipmentResponse_ShipmentResults includes a property named PackageResults. This property follows the pattern I've mentioned previously.

baywet commented 9 months ago

We might be able to make your case work better by returning null when the JSON we're reading is not an array here

But this would still require the manual edit you've made (and now the empty lists would work as long as you use a null propagation operator before the count).

The reason why I'm reluctant to re-order the generated code differently is because while this would improve your use case, it'd make most cases slower or might even break them (in case you have a truly discriminated union, you might end up with one property initialized when it shouldn't). Does that make sense?

Irame commented 9 months ago

Returning null here seems like a good idea, but I'm not sure how rearranging the checks could cause issues. From what I understand, if it's a list, it wouldn't have a discriminator value, so it's safe to treat it as a list. Otherwise, we can check the discriminator value. Am I missing anything here? Also, I don't think there would be a noticeable performance impact. The IParseNode already knows whether it's a list, and if GetCollectionOfEnumValues just returns null as you suggested, it's only a matter of a null check or an is List<> pattern match against null. That said, I'm not deeply familiar with this library or every aspect of C#/.NET performance, but I'm keen to learn more.

baywet commented 9 months ago

Let's say we have a Personable exclusive union type that's implemented by Employee/Customer/Shareholder. The generated code, assuming we have proper discriminator information, would look something like that.

public static Personable CreateFromDiscriminatorValue(IParseNode parseNode) {
    _ = parseNode ?? throw new ArgumentNullException(nameof(parseNode));
    var mappingValue = parseNode.GetChildNode("personType")?.GetStringValue();
    var result = new Personable();
    if("Customer".Equals(mappingValue, StringComparison.OrdinalIgnoreCase)) {
        result.Customer = new Customer();
    }
        else if("Employee".Equals(mappingValue, StringComparison.OrdinalIgnoreCase)) {
        result.Employee = new Employee();
    }
        else if("Shareholder".Equals(mappingValue, StringComparison.OrdinalIgnoreCase)) {
        result.Employee = new Shareholder();
    }
    return result;
}

This implies that if the API starts randomly returning a new "DomesticPartner" type that's not in the description, we won't have any side effect here, none of the properties will be initialized, and the client application will "know" something is wrong. (description needs to be updated, client refreshed, and that case handled....)

Now, in the edit you've made, that I'm copying here for context

public static Result_OneOrManyItems CreateFromDiscriminatorValue(IParseNode parseNode) {
    _ = parseNode ?? throw new ArgumentNullException(nameof(parseNode));
    var result = new Result_OneOrManyItems();
    var list = parseNode.GetCollectionOfObjectValues<ApiSdk.Models.Item>(ApiSdk.Models.Item.CreateFromDiscriminatorValue)?.ToList() as List<ApiSdk.Models.Item>;
    if((list.Count ?? 0) == 0) {
        result.ResultOneOrManyItemsItem = new ApiSdk.Models.Item();
    }
    else {
        result.Item = list;
    }
    return result;
}

Notice how the code went from checking the discriminator value to simply defaulting back to the single value? If we transposed that to my example this raises multiple questions:

Assuming we change the deserialization code to return null when the result is not an array, and keep the originally generated code in your case. Here is what will happen:

With the change in deserialization behaviour, and this edit, everything should work as expected, without impacting negatively other scenarios.

public static Result_OneOrManyItems CreateFromDiscriminatorValue(IParseNode parseNode) {
        _ = parseNode ?? throw new ArgumentNullException(nameof(parseNode));
        var result = new Result_OneOrManyItems();
                if(parseNode.GetCollectionOfObjectValues<ApiSdk.Models.Item>(ApiSdk.Models.Item.CreateFromDiscriminatorValue)?.ToList() is List<ApiSdk.Models.Item> itemValue) {
            result.Item = itemValue;
        } else if (parseNode.GetObjectValue(ApiSdk.Models.Item.CreateFromDiscriminatorValue) is {} itemValue) {
            result.ResultOneOrManyItemsItem = itemValue;
        }
        return result;
    }

The only side effect of doing that is it might lead to double deserialization in the case of a single object (not in a collection) which will slightly impact performance.

With that context in mind, would you be willing to contribute to the serialization library?

Irame commented 9 months ago

Hi I took a look at the serialization library and noticed that in order to make this change the return type of IParseNode.GetCollectionOfObjectValues had to be nullable so I had to make a change to the abstraction library as well.

I created two draft pull requests. I hope this is ok and I didn't skip any wanted discussion with this.

baywet commented 9 months ago

yes, I've seen the draft pull requests, thanks for starting that. I'm a bit worried that changing the nullability is a breaking change. But I believe it was a bug to start with to have it non nullable when compared with other languages:

https://github.com/microsoft/kiota-java/blob/0bbe4bd6d7809d0084d60f3cffa892f731ba84e5/components/serialization/json/src/main/java/com/microsoft/kiota/serialization/JsonParseNode.java#L175

https://github.com/microsoft/kiota-serialization-json-go/blob/8c3bb9da8069d97cce8f5cd2277285ed9dc6e7f1/json_parse_node.go#L237

https://github.com/microsoft/kiota-serialization-json-php/blob/4ee70f03340f2167d354798d9c28007a4c33b305/src/JsonParseNode.php#L82

(python might need fixing as well) https://github.com/microsoft/kiota-serialization-json-python/blob/ef9c4a8c5277cf9be7774bbd195e61dbbb249120/kiota_serialization_json/json_parse_node.py#L157

https://github.com/microsoft/kiota-typescript/blob/2f41919ca76642ea5d93647299d77270120b3043/packages/serialization/json/src/jsonParseNode.ts#L63

What comforts me in that change is the fact the generated code won't need to change https://github.com/microsoftgraph/msgraph-sdk-dotnet/blob/29c7e1e1146fc2c2ce84ebe96e5cf825562ddada/src/Microsoft.Graph/Generated/Models/Message.cs#L374 (see the null propagation operator)

And the fact the current implementation can lead to nasty bugs without the backing store:

  1. get an object that has a property which is a collection of objects, but don't select that property, it'll be initialized to []
  2. change another property on that object
  3. send it back to the service for an update, you've now sent [] to the service and potentially reset the collection invertedly.

Before we proceed I'd like the opinion of @andrueastman on the topic, especially on the abstractions PR. I think for the static methods it's fine since we only introduced them a couple of weeks ago. What I'm interested in is the change for the parse node interface. (see links in the history above)

baywet commented 9 months ago

Apparently the dotnet team ran into a similar issue and they decided to roll out the changes within the same major version

baywet commented 8 months ago

gentle reminder @andrueastman on this one. It seems we forgot about it. Sorry about that @Irame

baywet commented 5 months ago

gentle reminder for @andrueastman to provide his input