hyperledger / iroha

Iroha - A simple, enterprise-grade decentralized ledger
https://wiki.hyperledger.org/display/iroha
Apache License 2.0
433 stars 277 forks source link

`JsonString` in schema vs in json #4900

Open arndey opened 1 month ago

arndey commented 1 month ago

In schema:

"Permission": {
  "Struct": [
    {
      "name": "name",
      "type": "String"
    },
    {
      "name": "payload",
      "type": "JsonString"
    }
  ]
},

but in genesis.json:

{
  "Grant": {
    "Permission": {
      "object": {
        "name": "CanSetParameters",
        "payload": null
      },
      "destination": "ed0120CE7FA46C9DCE7EA4B125E2E36BDB63EA33073E7590AC92816AE1E861B7048B03@wonderland"
    }
  }
},

Expected result

Schema:

"Permission": {
  "Struct": [
    {
      "name": "name",
      "type": "String"
    },
    {
      "name": "payload",
      "type": "Option<JsonString>"
    }
  ]
},

Logs

Log contents ```json Replace this text with a JSON log, so it doesn't grow too large and has highlighting. ```

Who can help to reproduce?

@mversic

Notes

No response

mversic commented 1 month ago

I'm not sure if this is a bug. The problem here is that schema is a description of how to serialize data model as SCALE. When serializing JsonString to SCALE we serialize it as a string. However, when serializing to json the format is completely different. I'm not sure how we can reconcile this. JsonString is serialzed as a json node so marking the field as Option<JsonString> is not the solution in this case. Consider this:

{
    "name": "CanRemoveKeyValueInAccount",
    "payload": {
        "account": "ed0120CE7FA46C9DCE7EA4B125E2E36BDB63EA33073E7590AC92816AE1E861B7048B03@wonderland"
    }
}

what is type of payload here? it's not string or null, it's a map. And what about this case:

{
    "name": "CanRemoveKeyValueInAccounts",
    "payload": ["ed0120CE7FA46C9DCE7EA4B125E2E36BDB63EA33073E7590AC92816AE1E861B7048B03@wonderland"]
}

the type is now array

0x009922 commented 1 month ago

In JSON, I think it is perfectly fine to have any JSON value for fields that are JsonString, including null.

The problem again is that the schema does really tell only how to implement SCALE encoding. And in SCALE... well, it is encoded as a string. And I think it's fine to keep the schema as is.

@arndey, what problem does it create for you specifically?

arndey commented 1 month ago

@0x009922 that's the first case when value of the field can be null but the type in the schema not optional, so the problem is because we generate our models from your schema and we can't operate nullable payload because it's not nullable in our model

  "Permission": {
    "Struct": [
      {
        "name": "name",
        "type": "String"
      },
      {
        "name": "payload",
        "type": "JsonString"
      }
    ]
  },
/**
 * Permission
 *
 * Generated from 'Permission' regular structure
 */
public data class Permission(
    public val name: String,
    public val payload: String,
)
arndey commented 1 month ago

Why it can be nullable here, but can't in Permission?

/**
 * ExecuteTriggerEvent
 *
 * Generated from 'ExecuteTriggerEvent' regular structure
 */
public data class ExecuteTriggerEvent(
    public val triggerId: TriggerId,
    public val authority: AccountId,
    public val args: String? = null,
)
"ExecuteTriggerEvent": {
    "Struct": [
      {
        "name": "trigger_id",
        "type": "TriggerId"
      },
      {
        "name": "authority",
        "type": "AccountId"
      },
      {
        "name": "args",
        "type": "Option<JsonString>"
      }
    ]
  }
Mingela commented 1 month ago

IMO Alternatively, for consistency Iroha should replace Option<JsonString> with just JsonString so that it's a nullable everywhere. Current contradiction is that you treat JsonString generally nullable but wrap it with Option somewhere introducing a false guess it should be treated nullable only if it's wrapped.

arndey commented 1 month ago

IMO Alternatively, for consistency Iroha should replace Option<JsonString> with just JsonString so that it's a nullable everywhere. Current contradiction is that you treat JsonString generally nullable but wrap it with Option somewhere introducing a false guess it should be treated nullable only if it's wrapped.

this will solve the current problem, but in terms of using the SDK, if all the fields are nullable, it will not be very convenient, it seems to me

Mingela commented 1 month ago

this will solve the current problem, but in terms of using the SDK, if all the fields are nullable, it will not be very convenient, it seems to me

you mean any field nullability should be highlighted via wrapping with Option?

arndey commented 1 month ago

this will solve the current problem, but in terms of using the SDK, if all the fields are nullable, it will not be very convenient, it seems to me

you mean any field nullability should be highlighted via wrapping with Option?

yes, and so far I don’t understand what the problem is, given the examples that I wrote earlier here

0x009922 commented 1 month ago

The source of truth is the data model. In it, we decided to encode JSON values as plain strings. Some structures require some payload to be (like permissions) and it is expressed as JsonString in the schema; others have it optionally (like trigger events) - it is expressed as Option<JsonString>.

Strictly speaking, JsonString with value "null" is not the same as Option<String> with value None or Some("null").

Why not make Option<JsonString> everywhere? Iroha can have a special-case handling, treating Option<JsonString> of value None the same as Some("null"), right?

Yes, it can. However, from the standpoint of the data model - why?.. This need doesn't come from the data model, which is the source of decisions. Moreover, this adds an implicit convention to the schema instead of relying on explicit behaviour "baked" into it by design.

The problem we face is that we actually added an implicit convention to how JsonString is represented in JSON. If we were strict, it would be like this:

{
  "payload1": "null", // `JsonString` => `"null"`
  "payload2": null,   // `Option<JsonString>` => `None`
  "payload3": "null", // `Option<JsonString>` => `Some("null")`,
  "payload4": "{\"foo\": \"bar\"}" // not really nice :/
}

But we (de)serialize JsonString as just JSON value in JSON format, and here is the ambiguity:

{
  "payload1": null,   // `JsonString` => `"null"`
  "payload2": null,   // `Option<JsonString>` => `None`
                      //  or should it be `Some("null")`?
  "payload3": "null", // `Option<JsonString>` => `Some("\"null\"")`,
  "payload4": { "foo": "bar" } // but this one is more convenient!
}

While it is convenient for most cases, it is impossible to distinguish None vs Some("null") - both are nulls in JSON repr.

Solutions?

  1. Make Option<JsonString> everywhere - meh :/
  2. Treat JsonString as a string in JSON
    • No ambiguity
    • Less convenient
  3. Change convention: add special representation for Option<JsonString> in JSON:
    • None - null
    • Some(value) - { "json": value }
      {
      // `None`
      "payload1": null,
      // `Some("null")`
      "payload2": { "json": null },
      // possible to set via plain json values
      "payload3": { "json": { "foo": "bar" } },
      "payload4": "null", //  DISALLOWED,
      }
Mingela commented 1 month ago

Thanks for the exhaustive overview @0x009922. Solution 3 seems to be the most convenient from the perspective of how serialized objects look. What about replacing Option<JsonString> with JsonNode (JsonObject)? so it's either null (None) or {..} (Some(obj)), not sure about Arrays though and whether we need them for the top level.

0x009922 commented 1 month ago

What about replacing Option<JsonString> with JsonNode (JsonObject)? so it's either null (None) or {..} (Some(obj)), not sure about Arrays though and whether we need them for the top level.

@Mingela you mean to replace Option<JsonString> with JsonObject in the schema, treating null and objects differently? I don't see how it's going to solve the same ambiguity when someones needs to set a null payload (not an absent payload). Moreover, this solution kind of doesn't belong to the data model design itself, which I explained above, if that makes sense.

In order to strictly distinguish payload absense vs null payload (it is not the same!) while preserving inlining payloads as JSON object and not as a string, the solution is to use null vs { someKey: payload }, or null vs [payload], or any other value vs a container with actual payload which are strictly distinguishable by shape.

Mingela commented 1 month ago

I don't see how it's going to solve the same ambiguity when someones needs to set a null payload

Here we should just decide about a nullable type to be used. I guess the challenge is introduced because the schema is oriented to Rust type system explicitly which has been discussed (and postponed) many times already.

payload absense vs null payload (it is not the same!)

I believe that statement might introduce a confusion. As for the schema definition of course if such a field is defined it should be expected by 'the schema implementing type system'. Though, any not mandatory attribute can be initialized to null by default even if not present in the deserializable object at all.

Going back to the initial problem of ambiguity of Option<JsonString> vs JsonString (null vs "null") the latter value "null" should be indeed prohibited. What is the benefit of treating None vs Some("null") differently? I can't get that really. It seems a Rust implementation implication that should be refactored/optimized.

So any JsonString value would contain an object ("{..}" or "[{..},..,{..}]", not "null") and for the attributes where nulls are possible it should be Option<JsonString>.

Mingela commented 1 month ago

@mversic I couldn't see how your examples are relevant though, by the current schema the values expected are:

and @arndey refers to null value shown in the genesis, not "null", so that Option<JsonString> seems reasonable there

When serializing JsonString to SCALE we serialize it as a string. JsonString is serialzed as a json node..

That's a hell of ambiguity. Please consider splitting JsonString into JsonString + JsonNode or smth then.

mversic commented 1 month ago

@mversic I couldn't see how your examples are relevant though, by the current schema the values expected are:

I need to correct you, you have put extra quotes. JsonString is not serialized as a string when serialized to JSON but rather as a json value node (null being one of possible variants together with array or map). It is serialized as string when encoded to SCALE. This is the jist of the problem. Schema is a description of how to represent data in SCALE, not json

* `"payload": {\"account\":\"ed0120CE7FA46C9DCE7EA4B125E2E36BDB63EA33073E7590AC92816AE1E861B7048B03@wonderland\"}`
* `"payload": [\"ed0120CE7FA46C9DCE7EA4B125E2E36BDB63EA33073E7590AC92816AE1E861B7048B03@wonderland\"]`
Mingela commented 1 month ago

I can understand the concept but there is no indication of how it should be treated by the client given we encounter both Option<JsonString> and JsonString possibly having the same value (null) implying JsonString gets (de)serialized differently (sometimes to JSON, sometimes to SCALE). Shouldn't we rely on distinct types respectively to the usage? Is that already achieved via Option wrapper meaning that's the SCALE and just JsonString is always JSON (probably not)?

@mversic what would be your suggestion on how to overcome this confusion? There have been multiple options mentioned in the discussion. We'd like this issue to get the target solution, preferably with a timeline assigned.

mversic commented 1 month ago

That's a hell of ambiguity. Please consider splitting JsonString into JsonString + JsonNode or smth then.

yeah, we need to rename it. I suggest JsonString -> JsonObject

@mversic what would be your suggestion on how to overcome this confusion? There have been multiple options mentioned in the discussion. We'd like this issue to get the target solution, preferably with a timeline assigned.

I suggest that JsonString is special handled by the sdk. That is, it's schema is implied by it's name and not read from the file

0x009922 commented 2 weeks ago

I suggest that JsonString is special handled by the sdk. That is, it's schema is implied by it's name and not read from the file

Agree on that.

There is no issue with schema in terms of SCALE encoding. JSON encoding according to schema is already a special case handled by Java SDK, so why not handle JsonString specifically too?

0x009922 commented 2 weeks ago

I advocate for the following solution:

Minimally: do not inline JsonString in JSON as an actual JSON node. Just leave at as a string with valid JSON inside. This removes the ambiguity and does not require any handling from SDK side.

// JsonString
"null"
"{\"foo\":false}"
"[1, 2, 3]"

// Option<JsonString>
null
"null" // note that it is different from the above
"[1, 2, 3]"

Preserving inline mode: on top of the minimal solution, allow a special notation in JSON in a form of { __inline__: ... }. This will require a special handling on SDK side - parse JsonString either as a string or extract a JSON node from the recognisable shape.

// JsonString
"null"
{ __inline__: null } // equivalent
"[1, 2, 3]"
{ __inline__: [1, 2, 3] } // equivalent

// Option<JsonString>
null                 // None
"null"               // Some("null")
{ __inline__: null } // Some("null")

Recommendation for SDK: handle JsonString in a special way - not as a plain string, but as a custom class implementing platform-native JSON handling. Example from iroha-javascript:

https://github.com/0x009922/iroha-javascript/blob/a895a17a0758a8802d68998a6f3c54689ba15b7f/packages/data-model/src/datamodel/core.ts#L164-L217

Note this specifically - it's just a string in SCALE, but then wrapped into a convenient class on top of it:

export const Json$codec: Codec<Json> = String$codec.wrap(
  (json) => json.asJsonString(),
  (str) => Json.fromJsonString(str),
)

Optional: rename JsonString to JsonValue in the schema. I am not sure about this though - effectively, it is still just a string in terms of SCALE encoding. The schema's only purpose is to describe SCALE encoding.