Closed planetis-m closed 3 years ago
Lastly a note about proposal #368. It will complicate significantly the construction of this macro and any other macro that works on object variants.
Very interesting!
first idea is good, rest are overcomplicated.
Moving away from black or white solutions, where a user must choose either speed or correctness, we can set the ambitious goal to adhere to the JSON standard, support Nim's language constructs and maintain good performance. It is actually entirely within our reach.
Duplicate keys
First things first, is handling duplicate keys in JSON.
Consider the JSON object
{"shared":1,"kind":"Apple","apple":"world","kind":"Banana"}
as an example.With the current stdlib design, a
Table[string, Option[JsonNode]]
is needed to track which key is already encountered (it is not implemented). This is expensive in regards to memory and performance.A direct deserializer has the knowledge of all the valid object fields and can generate optimal code like this:
The user can choose either to skip or error on duplicate keys, both spec compliant behaviors. A working benchmark shows this technique achieves about 10% better performance on JSON objects without duplicate keys. On objects with duplicate keys (not a common case) it would vary a lot but it would be even faster.
Object variants
Another thorn of high performance JSON deserializers is object variants. My current implementation for example, expects the discriminator field first else it errors. Another possible solution is backtracking, implemented by
jsony
andnim-json-serialization
. It results in unpredictable performance, so it should be avoided IMO.To efficiently support them, we can build on top of the previous solution. The algorithm is as follows. Every time we encounter a field belonging to another variant we check if we should error or make a transition. If any field, that does not exist in the next variant, is already encountered, throw an error. Else continue, coping the common fields over to the new object. This way we dont have to maintain a temporary for each and every field as that would blow our space requirements.
The linked example is as complex as it needs to be to prove its feasibility. It still maintains about the same performance edge over the simplest solution of expecting the discriminator field first.
Runtime types
The problem. Consider this object hierarchy:
JSON would look like
[{"speed":20},{"rotation":0.0}]
. A deserializer processing this input has no idea about the runtime type of each array element thus it either errorseminim
or results on data lossjsony
.Following the same logic as before this example can also be supported. I only have a rough outline of my idea right now, but I can prepare a POC.
Writing custom deserializers
This is also problematic because the user is entirely on their own and it's more complex than say https://github.com/nim-lang/RFCs/issues/247 A lot can go wrong especially in regards to the order of keys encountered. This proposal does not cover this use case.
How do others make it?
Rust has invented its own format that can work with sum types. Basically an object is prefixed by its type, like
Banana{}
. I am not aware of others.All the logic is figured out, all is needed an implementation. Please consider funding my work.
Lastly a note about proposal https://github.com/nim-lang/RFCs/issues/368. It will complicate significantly the construction of this macro and any other macro that works on object variants.