pwall567 / json-kotlin-schema-codegen

Code generation for JSON Schema (Draft 07)
MIT License
78 stars 14 forks source link

Support for oneOf #2

Open Malinskiy opened 3 years ago

Malinskiy commented 3 years ago

Hi! Thanks for an awesome project!

I'm trying to use it currently and my specific schema uses oneOf quite a lot.

Are there any plans to support this?

Thanks!

pwall567 commented 3 years ago

Hi Anton, thanks for this. Yes, I am looking into oneOf at the moment, in particular for cases like the following:

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "https://pwall.net/test-oneof",
    "title": "Customer",
    "type": "object",
    "properties": {
        "id": {
            "type": "string",
            "format": "uuid"
        },
        "name": {
            "type": "string",
            "minLength": 1
        },
        "customerType": {
            "type": "string"
        }
    },
    "required": [ "id", "name", "customerType" ],
    "oneOf": [
        {
            "properties": {
                "customerType": {
                    "const": "domestic"
                },
                "domesticAddress": {
                    "$ref": "test-domestic-address.schema.json"
                }
            },
            "required": [ "domesticAddress" ]
        },
        {
            "properties": {
                "customerType": {
                    "const": "international"
                },
                "internationalAddress": {
                    "$ref": "test-international-address.schema.json"
                }
            },
            "required": [ "internationalAddress" ]
        }
    ]
}

Is this the type of usage you have in mind? If not, can you give me an example of the kind of usage you would like to see covered?

In any case, this is not going to be straightforward and I can't promise anything immediately - perhaps a couple of weeks.

Malinskiy commented 3 years ago

Sure thing, here is an example: https://bitbucket.org/atlassianlabs/atlascode/raw/main/resources/schemas/pipelines-schema.json

"script": {
    "description": "Contains a list of commands that are executed in sequence. \n\nScripts are executed in the order in which they appear in a step. \n\nWe recommend that you move large scripts to a separate script file and call it from the bitbucket-pipelines.yml.",
    "type": "array",
    "items": {
        "oneOf": [
            {
                "type": "string",
                "description": "Command to execute",
                "minLength": 1
            },
            {
                "$ref": "#/definitions/pipe"
            }
        ]
    },
    "minItems": 1
},

I think the closest thing in kotlin is sealed classes for this use case, but I may be wrong

pwall567 commented 3 years ago

Thanks for giving such a comprehensive example.

The sort of cases I have been looking at involve the use of a "discriminator" field ("customerType" in my example), which would allow the use of strongly-typed dependent properties. I like to think that if there are sufficient validations in the constructors for the generated classes, they will be usable by deserialization functions with no additional custom code or parameterization.

To generate classes for your examples that would allow automatic deserialization would be very difficult, but if you were prepared to accept the use of Any (for example, as the type of the array items in the snippet above) then it would be reasonably easy to modify the code generator to generate classes for your schema. The generator already outputs Any for cases where an array of types is specified (e.g. "type":["string","number"]) so this would just be an extension of that behaviour.

One of the main advantages of code generation is that it gives you strongly-typed classes to work with and this would run counter to that principle, but with a limited amount of casting and custom deserialization it would probably still be valuable.

Let me know what you think - would outputting Any and leaving the casting to you still be useful?

pwall567 commented 3 years ago

I have made two changes to the Code Generator.

The first is to allow oneOf and anyOf, but ignore their contents. The resulting code will in some cases have references to Any objects, and where the Code Generator has attempted to output a named class for on object with no property information (because all the information was inside a oneOf) it is output as an empty open class - this leaves the possibility of creating a derived class that could be assigned to the property. None of this is ideal, but it's difficult to see what kind of code generation would be possible for a schema making extensive use of oneOf.

The second change is to allow non-Kotlin identifiers - your example includes property names like max-time - the Code Generator now allows such names and encloses them in backticks.

You might like to try the new version - 0.34. Using this version I have generated code from the schema file you linked to. The code compiles, but I don't know how useful it is (in particular, the default name is taken from the $id, but this clashes with another nested class).

myronmarston commented 3 years ago

We also have schemas that use anyOf. Our schemas are generated from GraphQL schemas (where type unions are a thing) and anyOf is the way that our JSON schema generator models a GraphQL type union in JSON schema. Here's an example:

      sourceDetails:
        anyOf:
        - properties:
            bankAccountId:
              type:
              - string
              - 'null'
            __typename:
              type: string
              enum:
              - BankAccountPaymentDetails
          required:
          - bankAccountId
          - __typename
        - properties:
            linkedServerPaymentId:
              type:
              - string
              - 'null'
            __typename:
              type: string
              enum:
              - ThirdPartyFeeReversalPaymentDetails
          required:
          - linkedServerPaymentId
          - __typename
        type:
        - object
        - 'null'

In our case, we have a __typename property that names the specific subtype of the GraphQL type union. I recognize that this library wouldn't be able to know which property names the subtype, but it's not clear to me that it's necessary for this library to generate something more useful than it does now. When there's an anyOf or oneOf with a list of different object schemas, could this library generate a data class for each subschema just like it would if those subschemas existed on their own, and have them subclass a common supertype that is used as the type for the property with the anyOf? As far as naming the data classes for the subschemas goes...it could just name it [Field]SubType1, [Field]SubType2 or whatever--in this case, SourceDetailsSubtype1 and SourceDetailsSubtype2.

pwall567 commented 3 years ago

Hi Myron, I'm sure I don't need to tell you that this is a tricky issue. Producing a solution that meets even a moderate subset of the possible requirements is very difficult.

I am narrowing down my focus onto something like the following (a bit more limited than my earlier example):

{
  "$schema": "http://json-schema.org/draft/2019-09/schema",
  "$id": "http://pwall.net/test/schema/test-oneof",
  "$defs": {
    "TypeA": {
      "oneOf": [
        {
          "$ref": "#/$defs/TypeB"
        },
        {
          "$ref": "#/$defs/TypeC"
        }
      ]
    },
    "TypeB": {
      "type": "object",
      "properties": {
        "xxx": {
          "type": "string"
        }
      },
      "required": [ "xxx" ]
    },
    "TypeC": {
      "type": "object",
      "properties": {
        "yyy": {
          "type": "string"
        }
      },
      "required": [ "yyy" ]
    }
  },
  "$ref": "#/$defs/TypeA"
}

This would produce code something like:

open class TypeA

data class TypeB(val xxx: String) : TypeA()

data class TypeC(val yyy: String) : TypeA()

This doesn't exactly meet your needs, but it might be close.

One important point - this uses oneOf, but your example uses anyOf. It's possible to determine that the two cases in your example are mutually exclusive, but that level of analysis is beyond what I was contemplating at this stage. You said that your schema comes from a generator, but is it possible to parameterise the generator to use oneOf instead of anyOf? (An alternative might be to post-process the output.)

I have a use case similar to this related to my day job, so it's possible there may be a bit more urgency applied to this in the next few days, but I can't promise anything.

myronmarston commented 3 years ago

This doesn't exactly meet your needs, but it might be close.

That's exactly what I'm looking for :).

....although as you pointed out, my schema has anyOf instead of oneOf. You're right that oneOf would be more correct here. IIRC, we ran into an issue with one of the JSON schema validation libraries we were using, where it's oneOf support didn't work right for us so we used anyOf instead. I'm going to dig into that and see if we can change it to use oneOf instead.

myronmarston commented 3 years ago

It was pretty easy to switch our generator to use oneOf instead of anyOf so the feature you proposed should work great for us @pwall567 if you wind up implementing it!

pwall567 commented 3 years ago

OK, version 0.47 includes an implementation of oneOf similar to my TypeA/TypeB/TypeC example above. It generates interface rather than open class, because base classes can be problematic with data classes. Also, using interface means that multiple schema definitions can include the same subschema in this way, and the subschema can be generated as implementing all of them.

My example assumes that the generator is generating a set of files using generateAll:

        codeGenerator.generateAll(JSON.parse(File("path/to/file"), JSONPointer("/\$defs"))

When there are other properties in the outer schema as well as the oneOf, the generator will output a base class, and if the subschema is not a $ref to a schema being generated as a separate class, the generator will generate nested classes within the interface or base class definition. In this case, naming can be a problem - there is little the generator can use to derive a meaningful name, so what I have decided to use is spreadsheet column naming - A, B, ... Z, AA, AB etc. These will always be nested classes within an interface (or in some cases a base class) and they will generally appear in code with the outer class qualifying the name, so the short names shouldn't be a problem.

I am in any case thinking of providing a way of specifying generated class names, but that's a longer-term issue.

I hope I'm not setting the expectation that I will always be able to respond to requests so quickly, but as I said earlier, I've had this issue in mind for some time. And a requirement from my day job (which has been satisfied by these changes) gave the topic a bit of impetus.

Let me know if you have problems with the new version - with any luck it will be useful, if not ideal.

myronmarston commented 3 years ago

Thanks @pwall567!

I hope I'm not setting the expectation that I will always be able to respond to requests so quickly

Nope. I appreciate the promptness but definitely don't have any right to expect it in the future.

I'll let you know how this is working for us once we're able to try it.

trajano commented 6 months ago

I just hit this issue actually, I wanted to do something simple like have different versions of the schema

e.g.

{
  "$schema": "http://json-schema.org/draft/2020-12/schema#",
  "title": "iBlum Program Questionnaire Schema",
  "oneOf": [
    {
      "$ref": "./questionnaire.schema-0.1.1.json#"
    },
    {
      "$ref": "./questionnaire.schema-0.1.2.json#"
    }
  ]
}

where questionnaire.schema-0.1.1.json starts with

{
  "$schema": "http://json-schema.org/draft/2020-12/schema#",
  "title": "xyz",
  "type": "object",
  "properties": {
    "schemaVersion": {
      "type": "string",
      "enum": ["0.1.1"]
    },

But I get errors like Unresolved reference: cg_array0

pwall567 commented 6 months ago

Hi @trajano , as you will have found if you read the entire thread of this issue, code generation for oneOf and anyOf is tricky.

I don't know what output you were hoping for, but any form of conditionality – deserialization into the 0.1.1 form or the 0.1.2 form – would depend on the deserialization software you were using.

I am concerned that you are getting compile errors on the code generator output, but I can't guarantee the viability of code generated for oneOf or anyOf except for the limited circumstances described earlier in this issue.

trajano commented 6 months ago

@pwall567 yeah I agree. It is probably best to just have n+1 schema files where the +1 would just ensure that there is a schema version check and parser. Then have n version specific schemas.