status-im / nim-serialization

A modern and extensible serialization framework for Nim
57 stars 9 forks source link

Weird compiler behavior when decoding case object, not sure if it's compiler bug or something else #30

Open jangko opened 3 years ago

jangko commented 3 years ago

I found this weird behavior when using toml-serialization, but turn out json-serialization also exhibit the exact same behavior.

example #1:

import json_serialization

type
  Kind = enum
    Apple
    Banana

  CaseObject = object
    case kind: Kind
    of Banana: banana: int
    of Apple: apple: string

let
  original = CaseObject(kind: Banana, banana: 11)
  json = Json.encode(original)
  decoded = Json.decode(json, CaseObject)

debugEcho decoded

example #2:

import json_serialization

type
  Kind = enum
    Apple
    Banana

  CaseObject = object
    case kind: Kind
    of Banana: banana: int
    of Apple: apple: string

let
  json = """{"kind":1,"banana":11}"""
  decoded = Json.decode(json, CaseObject)

debugEcho decoded

Both example expected to produce same output.

example #1 compile and run without problem. but example #2 failed to compile with this error messages:

C:\Users\AndriLim\.nimble\pkgs\serialization-#head\serialization.nim(64) apple
C:\Users\AndriLim\.nimble\pkgs\serialization-#head\serialization.nim(46) readValue
C:\Users\AndriLim\.nimble\pkgs\json_serialization-#head\json_serialization\reader.nim(322) readValue
C:\Users\AndriLim\.nimble\pkgs\serialization-#head\serialization\object_serialization.nim(184) readField
F:\projects\status-nim\lib\system\assign.nim(243) FieldDiscriminantCheck
F:\projects\status-nim\lib\system\fatal.nim(49) sysFatal
Error: unhandled exception: assignment to discriminant changes object branch; compile with -d:nimOldCaseObjects for a transition period [FieldError]

The presence of json = Json.encode(original) seems alter the compiler behavior magically. Of course we want example #2 behave like #1 despite err msg looks legit.

ping @zah

zah commented 3 years ago

The key here is the -d:nimOldCaseObjects feature of Nim. At some point (I think version 1.0) Nim decided to disallow changing the branch of a case object at run-time. The -d:nimOldCaseObjects option was introduced for backwards compatibility and all Status projects are still using it because the upgrade was considered somewhat difficult.

For example, In the deserialization code it's much more convenient to start with a default constructed object that will be gradually populated with the correct data as you parse the file. The problem is that the default value has the discriminator field kind already assigned to 0 (which is a valid value for Banana in the sample above). Thus when parser reaches the "kind":1 step, it tries to change the discriminator and this results in a run-time error under the default compilation setting.

To add additional twist to the story, Nim 1.4 will relax the rules once again. It implemented proper destruction of all affected fields when the discriminator value changes, so it will be allowed (I think). Nevertheless, my plan for nim-json-serialization is slightly different from just relying on this:

When you compile the code with {.push fieldchecks: off.}, Nim removes all code related to these discriminator checks, so I want to implement the same checks in the library itself by creating a more sophisticated FieldReadersTable. I'm interested in gaining additional features such as adding the ability to check that all fields of the objects has been provided in the JSON and also to support the common practice of defining JSON formats where the type of the serialized data depends on the value of the descriminator even through the field names stay the same.

Consider the keystore format of Ethereum 2. An Scrypt keystore looks like this:

{
   "crypto": {
        "kdf": {
            "function": "scrypt",
            "params": {
                "dklen": 32,
                "n": 262144,
                "p": 1,
                "r": 8,
                "salt": "d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3"
            },
            "message": ""
        }
}

A PBKDF2 keystore will be the following:

{
    "crypto": {
        "kdf": {
            "function": "pbkdf2",
            "params": {
                "dklen": 32,
                "c": 262144,
                "prf": "hmac-sha256",
                "salt": "d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3"
            },
            "message": ""
        }
}

Notice how the params field is different, depending on the value of the function field. We can map this in Nim in the following way:

type
  Kdf* = object
    case function*: KdfKind
    of kdfPbkdf2:
      pbkdf2Params* {.serializedFieldName: "params".}: Pbkdf2Params
    of kdfScrypt:
      scryptParams* {.serializedFieldName: "params".}: ScryptParams
    message*: string

... but this would require a "context-dependent" replacement of the FieldReadersTable defined more like a finite automaton than a simple table. It will also carry enough information to detect the invalid records such as { "kind": "apple", "bananaLength": 10} and it can populate a bitmask that will be able to demonstrate that the object was fully constructed.

The Eth2 keystore format is not exceptional in this regard. The same pattern is used in some other wide-spread standardized formats such as GeoJSON.