microsoft / kiota

OpenAPI based HTTP Client code generator
https://aka.ms/kiota/docs
MIT License
2.92k stars 202 forks source link

How can we handle untyped JSON? #2319

Closed darrelmiller closed 7 months ago

darrelmiller commented 1 year ago

Currently in the OpenAPI for Graph we have this which I'm fairly sure is wrong. I don't think the @odata.Type comes back with graph.JSON payloads

    microsoft.graph.Json:
      title: Json
      required:
        - '@odata.type'
      type: object
      properties:
        '@odata.type':
          type: string
          default: '#microsoft.graph.Json'

and uses of it look like this.

            contentInfo:
              anyOf:
                - $ref: '#/components/schemas/microsoft.graph.Json'
                - type: object
                  nullable: true

I tried something similar to this in a different API and was expecting to see other properties appearing in the AdditionalData collection but there was nothing.

I would expect

            someThing:
              type: object
              properties:
                someObject:
                  type: object
                someArray:
                  type: Array

to project a type like

class SomeThing {
  someObject: JsonObject;
  someArray: JsonArray;
}

However, to do that, the serializer would have to expose via some kind of interface that would communicate what types to use for generic JSON content.

baywet commented 1 year ago

Tying the generation result to any specific format would break one of our tenets.

However we could use the a default entity class added to the abstractions with no properties so all the undocumented properties would go to the additional data.

For the array it's a bit more complicated since it could be an array of objects or scalar values.

And we need to think about writing back too.

Overall I thought the last time we talked about the we decided not to facilitate those scenarios since that would make kiota wrap around the serialization librairies with little control over what's going on. Falling early encourages API producers to fix their description.

darrelmiller commented 1 year ago

I wonder if we could fallback to a memorystream/byte array for properties that we don't know how to deserialize? That would at least allow the consumer to manually deserialize the properties.

This isn't just about "fixing" the API description. There are parts of APIs that are declared as "untyped" content. Microsoft Graph does this in a number of places where content is simply described as JSON.

The problem with relying on AdditionalProperties is that it only supports primitives in a reasonable way. Where additional properties are complex types like objects and arrays, we expose it as JSON strings. At least that's how we used to do it. I'm assuming Kiota does it the same.

andrueastman commented 1 year ago

Where additional properties are complex types like objects and arrays, we expose it as JSON strings. At least that's how we used to do it. I'm assuming Kiota does it the same.

At the moment the serialization library exposes these types as object representation of the json Object depending on the serialization library. As the json library uses System.Text.Json these are exposed as instance of JsonElement. If someone build a json lib with NewtonSoft it would probable a JObject instance.

https://github.com/microsoft/kiota-serialization-json-dotnet/blob/3a5f1f71fa082cb882ba73cb6060ef3719e744d4/src/JsonParseNode.cs#L354

I think in this scenario,

            someThing:
              type: object
              properties:
                someObject:
                  type: object
                someArray:
                  type: Array

could project to generic object types like

class SomeThing {
  someObject: object;
  someArray: object;
}

And then the serialization libraries could still assign the appropriate type to the objects such as JsonElement for System.Text.Json. Consumers would however, need to be aware that they need to cast the object to the type used by the serialization library and this way the generator would be agnostic of the serialization construct while the serialilizer still does its job?

baywet commented 1 year ago

The value of supporting arbitrary JSON decreases a great deal when more advanced concepts like union/intersection types are supported. (our previous generation of SDKs didn't have support for that) With that in mind I believe the only scenario for arbitrary JSON is when the API producer doesn't own/know about the payload. Like the 3rd party webpart configuration objects in SharePoint pages APIs. But again, this is poor API design since they ask for a schema when building the webpart to begin with. It's just that we don't have infrastructure in place to vary the API schema by tenant like dynamics does and didn't have the infrastructure to generate clients by tenant (which we now have thanks to kiota).

I'm just worried that we're trying to enable scenarios that will produce terrible client code instead of adding incentive for client consumers to talk to the API producers into improving their design.

Right now I believe most languages deserialize scalar properties into the additional data and anything else is stored as IParseNode instance. But we're probably not symmetric in the serialization.

To Andrew's point: using a generic object that people have to cast will make for an error prone client code, I'd much rather just generate with an IParseNode type for those unknown properties. And we'd probably need to augment that interface with a kind (object/array/scalar) to enable symmetric serialization.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

andrueastman commented 1 year ago

Re-opening this as this is needed for API's using the microsft.graph.Json class which are currently generated incorrectly.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

baywet commented 1 year ago

Modeling a random "Json" OData complex type on the API side when the value can be arrays, scalar values etc... is not giving much consideration to the client side. Complex types model objects/classes and can only describe a subset of cases (objects).

To "fix it" we'd need the conversion library to understand "no this is not a complex type, but random json" and map all properties making use of that type to a union type. It'd need to express something like string[]|boolean[]|number[]|string|boolean|number|.... which would miss the case of objects/array of objects (which should arguably be modelled properly instead).

My suggestion is to keep pushing back on API producers so they accurately model their data instead of throwing back random JSON to the clients. Using a type generation mechanism loses a lot of value when the type information is missing.

Now, if we wanted to support this anyway, we'd need some kind of generic JsonElement (like system.text.json) and I guess the serialization/deserialization mechanism would be "try as much as you can to deserialize to objects/collections/properties as long as we have the type information, then try to deserialize to key/value in the additional data if the value is scalar or array of scalars, for anything else return the generic json element. This becomes a bit more challenging when we have an array, or scalar values at the root of the node we're parsing as additional data was designed with an object structure in mind. We could make use of magic keys in the dictionary as @rkodev was suggesting.

The user could then:

  1. Cast the "object" to the generic json element
  2. Access edge cases using the magic keys

I personally don't see having to rely on the implementation json type (from system.text.json) as an issue as we're in uncharted territories anyway and I think designing a layer of json abstractions in our kiota abstractions would be a lot of additional work. If people complain about being tied to a certain implementation they can either implement this whole logic again with a different library or pressure their API producer to do a better job at describing what lives on the API.

Now, I'd like everyone to provide feedback on the approach and @darrelmiller to come up with a design for it, especially accounting for cases like Go or TypeScript where we don't have a JsonElement equivalent.

darrelmiller commented 1 year ago

I think our IParseNode design that we use to abstract away our serializers already gets us most of the way there. e.g. https://github.com/microsoft/kiota-abstractions-dotnet/blob/main/src/serialization/IParseNode.cs

I think we might just have to enhance that design a bit to allow someone to interact with ParseNodes directly. If we created a ListNode, MapNode, BoolNode sort of like we do in OpenAPI.NET https://github.com/microsoft/OpenAPI.NET/tree/vnext/src/Microsoft.OpenApi.Readers/ParseNodes then we can put a generic interface on top of any existing serializer implementation.

baywet commented 1 year ago

The current structure we have would work until we run into an object we don't have a schema for. We don't have a GetAnonymousObject or GetCollectionOfAnonymousObjects in the interface and adding those methods would result in a breaking change.

Additionally, the parse node doesn't have the ability to serialize itself today, which would as well result in a breaking change (maybe we should unify the SerializationWriter and the ParseNode interfaces?)

andrueastman commented 1 year ago

Additionally, the parse node doesn't have the ability to serialize itself today, which would as well result in a breaking change (maybe we should unify the SerializationWriter and the ParseNode interfaces?)

Should calling parseNode.GetStringValue() be returning the string/serialized representation of the parseNode? 😅

The current structure we have would work until we run into an object we don't have a schema for. We don't have a GetAnonymousObject or GetCollectionOfAnonymousObjects in the interface and adding those methods would result in a breaking change.

In the current state, the serializer could probably just return an IParseNode instance(i.e JsonParseNode). The user of the client library can then simply call the relevant functions like GetStringValue if it's a string, or GetCollectionOfPrimitiveValues<string> if its list of strings to eliminate the coupling with the specific serialization library (similar to what would be done on JsonElement). This will be supported by all languages but has the downside that the API vagueness is passed onto the client user.

This becomes a bit more challenging when we have an array, or scalar values at the root of the node we're parsing as additional data was designed with an object structure in mind. We could make use of magic keys in the dictionary as @rkodev was suggesting.

The array can still be modelled as a IParseNode as @darrelmiller suggests(then the user calls GetCollectionOfPrimitiveValues<string> on it).

The tricky part for me is really replacing return and model types of Json (or alternative equivalent) with JsonParseNode in request executors and models without mixing concerns. This would maybe mean that the conversion lib will use a setting to replace the Json type with a something Kiota would use to know that it needs to use the IParseNode types. As we probably don't want kiota to have the setting to strip the type from the model tree. Serailization libraries may need updating to return the rootParseNode directly.

baywet commented 1 year ago

Should calling parseNode.GetStringValue() be returning the string/serialized representation of the parseNode?

I'd call it GetRawValue instead to avoid any confusion.

The array can still be modelled as a IParseNode as @darrelmiller suggests(then the user calls GetCollectionOfPrimitiveValues on it).

I'm referring to this example

{
 "schemaProp1": true,
 "schemaProp2" : "bar",
 "nonSchemaProp": {
    "nestedProp": [ [1, 2, 3], [4,5,6]]
  }
}

We don't have the ability to retrieve arrays of arrays today. So I'm guessing nonSchemaProp would be in the additional data, its value would be a JsonParseNode. And people would have to :

  1. cast
  2. detect it's an object,
  3. get the nested prop (another JsonParseNode)
  4. detect it's an array.
  5. get the first JsonParseNode of the array
  6. get the first JsonParseNode
  7. detect it's a number
  8. get the value.
  9. repeat for each value in the sub array
  10. repeat for each value in the main array

Which makes for a very cumbersome experience, and means we'd need to expose a bunch of TryGet methods like JsonElement does or a bunch of IsWhatever methods like gson does.

darrelmiller commented 1 year ago

We need to create the following types:

These types need to derive from UntypedNode. UntypedObject needs a property called "properties" that is a map of UntypedNode.

Serializers need to implement the reading and writing of these types.

Darrel to try and implement in CSharp. Someday.

KennethHoff commented 10 months ago

Is there any news in this regard? Can we expect this in 1.9, or is it something that's just in the 1.9 milestone just because? It's seemingly been moved from the Post-GA Milestone all the way to 1.9 and no comments here for months.

baywet commented 10 months ago

Hi @kennethHoff Given we're 7 work days away from the 1.9 release, this is likely to be carried forward to 1.10. This has been a resourcing/prioritization issue for months now. @sebastienlevert for visibility

baywet commented 10 months ago

Actually, moving it to 1.10 now, as even if we got the spec today, I'd be the one implementing it for the core languages and given my workload, it's unlikely to happen anytime before the release.

MartinM85 commented 9 months ago

Is there any way how the community can help with this? I guess it requires changes in more repositories, at least some subtasks which don't require a deep knowledge of Kiota before the main implementation can be done.

baywet commented 9 months ago

@martinM85 thanks for offering your help. I'm guessing the first baby steps we could take here are:

  1. introduce the types suggested by Darrel in kiota abstractions dotnet
  2. add a test json payload in kiota serialization json that'd contain: a object with known properties and unknown properties to cover all cases (known as in "part of the parsable model definition")
  3. define unit test cases to implement symmetric parsing/serialization (and the other way around) of that payload
  4. implement the code in JsonParseNode and JsonSerializationWriter so we get to the expected result.

Once we have a working prototype in dotnet, we can then think about replicating the efforts across other languages.

Thoughts? don't hesitate if you have follow up questions

MartinM85 commented 9 months ago

Hi @baywet, it's quite clear. I will have probably have more questions later.

LockTar commented 6 months ago

@baywet I see that this has a milestone for v1.13. I'm using v1.12. If I use the latest version of all the related Nuget packages, we get issues with https://github.com/microsoft/kiota-abstractions-dotnet/issues/175.

When we downgrade the following packages to the specific versions (almost latest) everything is working fine with v1.12.

Microsoft.Kiota.Abstractions Version=1.8.1 -> 1.7.12
Microsoft.Kiota.Serialization.Json Version = 1.2.0 -> 1.1.8
Microsoft.Kiota.Http.HttpClientLibrary Version = 1.3.8 -> 1.3.7

So what will v1.13 change? What could be the reason? In v1.12 (with downgraded packages), AdditionalData is a Dictionary<string, object>. In v1.12 with the latest version of the packages this is at runtime UntypedArray.

sebastienlevert commented 6 months ago

The generated code would change with untyped JSON @LockTar and 1.13 was shipped this morning. Can you validate this solves the issue now?

LockTar commented 6 months ago

Hi, If the generated code is changed in v1.13 then I believe it will going to work. At the moment we don't have time to update but if you say is true then indeed my issue will go away, Of course I need to change the code that is using the generated code. But that's ok.

If I update in the future and it doesn't work I will create a new issue. For now my comment can be discarded.