Deserialization fails for enum member named `New`

hognevevle commented 1 month ago

Hi there,

I'm running into an issue where Kiota seems unable to handle enum members with a name of new/New. If such an enum is received from the API, Kiota will set the value to undefined.

Reproduction at https://github.com/hognevevle/kiota-enum-bug , which has a .net Web API and a simple typescript client app. The Web API exposes one GET endpoint, which receives an enum (e.g. x=New), and echoes it back to the client together with a timestamp.

Start the web API.
In the Frontend directory, run npx tsx index.ts. In the output you will see the issue:

PS C:\Playground\kiotaplayground\Frontend> npx tsx .\index.ts
Response from API with x=Old: { someEnum: 'Old', date: 2024-07-14T20:19:32.770Z }
Response from API with x=New: { someEnum: undefined, date: 2024-07-14T20:19:32.770Z }

In the last response, someEnum should be New. (To clarify; the actual response from the API is correct and as expected. )

baywet commented 1 month ago

Hi @hognevevle Thanks for using kiota and for reaching out. Have you tried using NewEscaped as the symbol instead?

hognevevle commented 1 month ago

Hi @baywet , sorry, I'm not sure I follow. Where would you like me to use NewEscaped?

The TS app sends the payload to the API with x=New.
The API receives the payload as expected and correctly deserializes it to the C# enum New.
The API echoes the same enum in the response
The TS app (i.e. Kiota) is not able to deserialize New given in the response, and the property ends up as being undefined

The passing of the enum from the request to the response isn't really of importance -- whenever the New enum is used in the API response, the problem occurs.

baywet commented 1 month ago

For context to everyone reading this conversation, here is what we generate today.

export const TestEnumObject = {
    NewEscaped: "New",
    Old: "Old",
} as const;

@hognevevle thank you for the additional information. Can you validate that response comes in cased properly? (New)

hognevevle commented 1 month ago

@baywet Updated the repro to log the raw API response:

Raw API response: { someEnum: 'Old', date: '2024-07-22T12:48:23.253122Z' }
Kiota-deserialized API-response with x=Old: { someEnum: 'Old', date: 2024-07-22T12:48:23.253Z }
Raw API response: { someEnum: 'New', date: '2024-07-22T12:48:23.252865Z' }
Kiota-deserialized API-response with x=New: { someEnum: undefined, date: 2024-07-22T12:48:23.252Z }

The response does indeed come in properly cased.

baywet commented 1 month ago

Thanks for the additional information. I think I found the culprit. here the deserialization code uses the object I provided in my earlier response to deserialize things. While this object structure is great for serialization (symbol => wire value is O(1)) it is not for deserialization (as we can see, the symbol can be unaligned with the wire name). Performing a reverse lookup on that object would be at worst O(n), not great. Generating the opposite object (mapping from wire format to symbol) would negatively impact the resulting bundle size, but would solve the problem while being O(1). And we should absolutely get rid of this "toFirstCharacterUpper" method, the serialization library shouldn't be doing any transformation of that sort. I think I'm leaning towards the additional object solution. Thoughts @andrueastman @koros ?

andrueastman commented 1 month ago

And we should absolutely get rid of this "toFirstCharacterUpper" method, the serialization library shouldn't be doing any transformation of that sort.

+1. This also solved for the problem with values with spaces and special characters as brought out in https://github.com/microsoft/kiota-typescript/issues/1219.

Generating the opposite object (mapping from wire format to symbol) would negatively impact the resulting bundle size, but would solve the problem while being O(1).

Maybe we could possibly initialize/create the reverse map from the object dynamically to take the hit at runtime rather than generation. https://stackoverflow.com/questions/56550463/invert-a-map-object

baywet commented 1 month ago

@andrueastman I like this suggestion. In this scenario would you either:

have a reverser function, still emit the const which would call into that reverser function (this way we only reverse once)
reverse every time in the deserialization code

andrueastman commented 1 month ago

I'm not sure I fully understand the first option @baywet.

I believe reversing on each serialization will have the minimal(no) impact on the generated code size as the entire change can all be done in the abstraction library within the getEnum method.

baywet commented 1 month ago

Allow me to elaborate my point.

This is what we currently generate.

export const TestEnumObject = {
    NewEscaped: "New",
    Old: "Old",
} as const;

We could decide to generate the same exact code and pass a reference to this object during the deserialization. This would require us to reverse his object every time we try to deserialize a value for this enum. It's probably not a huge performance trade off but it's still counts.

Or we could decide to also generate a sister constant that calls into the reversal function once for the application domain like in this example.

export const TestEnumSerializationObject = {
    NewEscaped: "New",
    Old: "Old",
} as const;

export const TestEnumDeserializationObject = reverse(TestEnumSerializationObject);

It does have an additional bundle size cost but it's less than this 3rd alternative to generate everything. And that cost is a trade off for better deserialization performance.

export const TestEnumSerializationObject = {
    NewEscaped: "New",
    Old: "Old",
} as const;

export const TestEnumDeserializationObject = {
   'New': 'NewEscaped',
   'Old': "Old"
} as const;

I don't have a hard definitive opinion on which approach we should choose I would just wanted to save this was another alternative to consider. Does that make things a little bit clearer?

andrueastman commented 1 month ago

Thanks for the explanation here. This makes sense now.

I think since we are currently at a state where things are broken (deserialization can't be done successfully in these scenarios), I would suggest initially going with the first option as it fixes the issue with the least changes, and I would also guess that Enums are generally not large in size so the performance trade off can be acceptable).

I do like the second option as it provides optimization on the bundle size (based on the number of enums existing in the definition) but maybe it would make sense to do some sort of measurement before committing? If we get results that we are unhappy with, it'll be harder to reverse to option 1 vs going from option 1 to 2 .....

baywet commented 1 month ago

Alright, let's start with the reversal in every deserialization run (option 1) instead of adding additional constants. @hognevevle is this something you'd like to submit a pull request for provided some guidance?

hognevevle commented 3 weeks ago

Yes, I could do that once I'm back from holidays in a couple of weeks 👍

microsoft / kiota-typescript

Deserialization fails for enum member named `New` #1277