microsoft / kiota

OpenAPI based HTTP Client code generator
https://aka.ms/kiota/docs
MIT License
2.97k stars 207 forks source link

[Question] Mismatched default value type handling #3398

Open andreaTP opened 1 year ago

andreaTP commented 1 year ago

Description

When the default value of a field doesn't match the type a warning is emitted by kiota, but, in this case, the generated code is not going to compile for allOf the non-dynamic languages.

Original issue

Faced this issue here in the OpenAI spec.

Reproducer

A minimal spec like this:

openapi: 3.0.0
info:
  title: Test
  version: 1.0.0
  description: something
paths:
  /api/something/v1:
    post:
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                max_tokens:
                  default: inf
                  type: integer
                  nullable: true
      responses:
        "200":
          description: "something"
          content:
            application/json: {}

This would produce the following Warning:

OpenAPI warning: #/paths/~1api~1something~1v1/post/requestBody/content/application~1json/schema/properties/max_tokens/default - Data and type mismatch found.

But it generates code that is not going to compile:

public int? MaxTokens { get; set; }
...
MaxTokens = "inf";

Desired resolution

I think that we can improve over the current situation, as having code generated but that is going to fail to compile is suboptimal. There are a few directions we can consider:

  1. Decide that this mismatch is a fatal error (for all non-dynamic langs at least ...) and abort the generation
  2. Since we are already emitting a warning, skip the assignment so that the generated code will compile
  3. In this specific case the intention is pretty clear, should we handle special numeric values? ( upstream ref maybe @darrelmiller knows more about)
  4. Another option would be to generate code such as: Int32.Parse("inf") so that the error will be moved at the user's runtime. Possibly throwing the relevant exception only in case the default needs to get applied.

Happy to hear more feedback!

baywet commented 1 year ago

Thanks for starting the conversation on this! For reference, this rule is defined in OpenAPI.net

We previously got feedback that being "too harsh during validation and blocking generation early on" is not appreciated but customers (I personally prefer failing as early as possible), so I don't think changing the severity is a viable option.

Int32 doesn't have an infinite representation in most languages. (double often does) I don't think diving into specification (IEEE734) of numeric formats and how well they were implemented in different languages as well as how well they are being used by APIs is going to be a valuable use of our time. It's going to be a whack a mole game.

What I'm going to suggest instead is that for known types (int, double, etc...) we use the corresponding type TryParse method before setting the default value in the DOM in kiota builder. This way we can check the value is supported for a limited number of types, and ignore other default values when they don't pass the parse test. Thoughts?

andreaTP commented 1 year ago

"too harsh during validation and blocking generation early on" is not appreciated but customers

I do agree with you, but I have received this feedback as well, and can relate πŸ™‚ .

Int32 doesn't have an infinite representation in most languages.

I haven't expressed it, but I was thinking at falling back to Int32.MaxValue in this specific case. Overall, probably we would need to go through the exercise anyhow, but I agree on postponing it as much as we can.

What I'm going to suggest instead is that for known types (int, double, etc...) we use the corresponding type TryParse method before setting the default value in the DOM in kiota builder. This way we can check the value is supported for a limited number of types, and ignore other default values when they don't pass the parse test. Thoughts?

I understand the reasons for you to propose this, but I'm not 100% convinced, let me try to elaborate: by performing the check at "generation-time" we are going to generate a dichotomy where the actual default value needs to be parseable by C# to be represented in the generated code. This is an unnecessary caveat that is going to bite back sooner or later ... I would very much prefer to leave this in the hands of the users (and they can shoot in their foot ... yes).

Probably, the most "semantically correct" behavior would be to defer the computation of the TryParse to the target language runtime(e.g. generating the corresponding TryParse as code), this way kiota logic can remain "dumb". I do see numerous benefits in this approach, let me know if I should elaborate any further.

Thinking about the issue twice, the resolution we are discussing only applies to(some of) the primitives. Should we expand the scope to Objects ? In this case either we pass through the Serializer(I'm using this approach pretty successfully in another project) or we need to just omit the default.

baywet commented 1 year ago

I haven't expressed it, but I was thinking at falling back to Int32.MaxValue in this specific case.

inf and maxvalue mean two very different things though. I don't think mapping one to another would help the users here.

I do see numerous benefits in this approach, let me know if I should elaborate any further.

Besides the validation logic not being tied to CSharp when the target language is not CSharp, thus avoiding "CSharp assumptions that don't hold in other languages", I'm curious of what other advantages do you see here? One downside I can see is the fact this approach would have a runtime cost (we'd have to generate additional code), where the one I proposed only has a generation cost.

Should we expand the scope to Objects ?

I'm not sure what a default object representation looks like according to OAS? do you have examples? would that be tied to a specific serialization format?

andreaTP commented 1 year ago

inf and maxvalue mean two very different things though. I don't think mapping one to another would help the users here.

You are completely right, I reported the issue upstream.

I'm curious of what other advantages do you see here?

For example, we wouldn't need to re-implement the logic in two different places(generation and runtime) when dealing with more complex types (more below on this subject). I do believe that tightening this to the CSharp semantics of TryParse is going to generate all kinds of edge cases when the target language is different.

I'm not sure what a default object representation looks like according to OAS? do you have examples? would that be tied to a specific serialization format?

I haven't realized I was about to open a πŸ•³οΈ πŸ‡ here πŸ˜… . Please correct me if I get any reference wrong, but, in this case, we have to resort to the JSON Schema specification where it's stated:

supply a default JSON value associated with a particular schema

I found some more context into this issue.

This means that an OAS like this:

openapi: 3.0.0
info:
  title: Test
  version: 1.0.0
  description: something
paths:
  /api/something/v1:
    post:
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/Test"
      responses:
        "200":
          description: "something"
          content:
            application/json: {}
components:
  schemas:
    Test:
      description: this is a first test
      properties:
        test:
          description: a test
          type: object
          properties:
            foo:
              type: string
            bar:
              type: integer
      default:
        test:
          foo: "baz"
          bar: 42
      type: object

is valid and is expected to use the default value. But, in this case, Kiota (from the current main) is simply ignoring the content of the default field.

At this point, we can separate this issue into 2:

We are already discussing possible solutions for the former, to tackle the latter, in another generator, I decided to defer the evaluation to the runtime leveraging the available deserializer, here you have an example of what it looks like. Without any doubt, there are tradeoffs in this decision, but I'm convinced it's the "safest" approach to generate code that will behave as expected, and so far, I'm happy with the decision as it automagically handles most of the edge cases (e.g. empty lists).

Please note that the "deserializer approach" can be optimized for primitive values (and I do it ).

baywet commented 1 year ago

I agree, let's leave default values for objects/arrays on the side for now. There's no demand for it, it's highly coupled with the serialization format, and I think it's an edge case (defining a whole object worth of default vs defining defaults for each of the scalar properties).

Refocusing the discussion on the scalar values, I'm guessing you're suggesting to emit something like that?


public class Test {
    public int Foo;
    public Test() {
        Foo = ParseNodeFactoryRegistry.DefaultInstance.GetRootParseNode("application/json", new MemoryStream(UTF8.GetBytes("100"))).GetIntValue();
    }
}

It feels like a lot of ceremony even though we'd enable a consistent behaviour between default values and deserialization. Also how do we know to use application/json?

andreaTP commented 1 year ago

let's leave default values for objects/arrays on the side for now. There's no demand for it, it's highly coupled with the serialization format, and I think it's an edge case (defining a whole object worth of default vs defining defaults for each of the scalar properties).

I differ on this one, let me explain some reasons:

On the project's landing page, there is written (and I always promote Kiota in my talks this way):

the best code generator support possible for OpenAPI and JSON Schema features

I do believe that default is a basic feature and should be fully supported out of the box. We are(probably) not receiving complaints because there is always a viable workaround of setting all of the fields explicitly. The current implementation ignores, without any feedback to the user, the default field in the case of objects/arrays and this reduces the chances of ppl being aware of the behavior (I was pretty surprised too). The bare minimum that should be done here is to emit a "Warning" when this situation is detected (to align with the "mismatched type" behavior), but full support should happen sooner or later.

Refocusing the discussion on the scalar values, I'm guessing you're suggesting to emit something like that?

Yes, the snippet resembles the essence of my proposal. I'm not saying it's perfect but I haven't found a more robust way to implement this feature, open to alternative ideas!

It feels like a lot of ceremony even though we'd enable a consistent behaviour between default values and deserialization.

That's correct, but it's the most "generic" approach that can be used to implement full support.

Also how do we know to use application/json?

This comes from the spec:

This keyword can be used to supply a default JSON value associated with a particular schema.
baywet commented 1 year ago

All very valid points. Then I guess we should wait on #3406 to be implemented so we have less code to generate. And emit something like this instead.

public class Test {
    public int Foo;
    public Test() {
        Foo = ParseNodeFactoryRegistry.GetRootParseNode("application/json", "100").GetIntValue();
    }
}

This solution is a bit tied to JSON representations since the default notion is defined in JSON Schema. (serialization formats that are binary, like cbor, or that use ident as meaning like yaml will break with this approach).

andreaTP commented 1 year ago

I don't thins this should ever go stale, we are just waiting for a plan to shape up. cc. @baywet

baywet commented 1 year ago

update: since #3406 has been implemented, we have a way forward: examples (in CSharp, call in the constructor):

The only gap left is probably enums, but that's already handled today.

Is this something you'd like to take a stab at?

andreaTP commented 1 year ago

Is this something you'd like to take a stab at?

I'm not sure I have bandwidth for this ATM 😞 , if no one is picking it up, I probably will but it's gonna take some time.