Multiple files not working with json pointer

pwall567 / json-kotlin-gradle

Gradle JSON Schema code generation plugin

17 stars 7 forks source link

Multiple files not working with json pointer #2

Closed magnusrobertsson closed 2 years ago

magnusrobertsson commented 2 years ago

I have an issue with generating data classes from multiple json schema files that reference each other. The idea is to have a shared schema from which other schemas can reuse.

I've tried a couple of options but have ran out of ideas. I'm using the following files:

shared.json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$ref": "#/shared/SharedA",
  "shared": {
    "SharedA": {
      "type": "object",
      "properties": {
        "foo": {
          "type": "string"
        }
      }
    }
  }
}

a.json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$ref": "#/a/A",
  "a": {
    "A": {
      "type": "object",
      "properties": {
        "baz": {
          "type": "string"
        },
        "shared": {
          "$ref": "shared.json#/shared/SharedA"
        }
      }
    }
  }
}

Option 1: Specify directory only

This works but generates "ugly" data classes. The classes even get names based on the filename, not from the json schema.

Gradle definition:

configure<JSONSchemaCodegen> {
    packageName.set("com.example.generated")
    inputFile.set(file("src/main/resources/schema"))
    outputDir.set(file("build/generated"))
}

Generates this:

data class A(
    val baz: String? = null,
    val shared: Shared? = null
) {

    data class Shared(
        val foo: String? = null
    )

}

Notice that the shared field contains an inner class named Shared instead of SharedA. It also generates another file with the Shared class definition. The inner class is redundant! Not very nice imho.

Option 2: Specify directory and json pointer

The following Gradle definition doesn't work at all.

configure<JSONSchemaCodegen> {
    packageName.set("com.example.generated")
    inputFile.set(file("src/main/resources/schema"))
    outputDir.set(file("build/generated"))
    pointer.set("/a”)
}

Gives this error:

> Error reading schema file - /Users/magnus/code/example/src/main/resources/schema

It seems like it trying to read the directory as a file.

Option 3: Specify schema file with json pointer

Doesn't work at all.

configure<JSONSchemaCodegen> {
    packageName.set("com.example.generated")
    inputFile.set(file("src/main/resources/schema/a.json"))
    outputDir.set(file("build/generated"))
    pointer.set("/a”)
}

Gives this error:

> Error reading schema file - https:/pwall.net/shared.json

Here the code generator doesn't understand that the reference to shared.json is a local file but adds a https prefix to it. I've tried to add file://but failed to make it work. Even if it had worked I wonder how the plugin would handle the two difference json pointers?

Option 4: Put everything in one file with the same json pointer

This works and creates three nice files as expected, but the schema becomes a "monolith". It defies the purpose of having decoupled definitions imho.

When using the json pointer I get nice data classes, i.e. one kotlin source file per definition, but it only allows one file! It seems there is lacking an option to specify multiple files with a json pointer for each file. Or how do I solve this issue?

pwall567 commented 2 years ago

Hi - thank you for giving such comprehensive detail on your issue.

The code generator was originally written to accept a list of input files (or directories, which would be scanned recursively), and it would establish a list of target classes to be generated. Then, if a property in one schema used a "$ref" to refer to another schema which was in the list of targets to be generated as a separate class, the generated code for the property in the first class would refer to the second. For example, the "Getting Started" JSON Schema page concludes with the development of two schema files: https://json-schema.org/learn/getting-started-step-by-step.html#references - if these two schema files are presented to the code generator, it recognises the

      "$ref": "https://example.com/geographical-location.schema.json"

line in the product.schema.json file as referring to a schema for which it is generating a class (in its target list), and it uses that class in the val in the Product class:

    val warehouseLocation: GeographicalLocation? = null

If a property of type object includes the schema definition inline, or uses a "$ref" to a schema which is not part of the target set, the code generator creates a nested class for the object, and uses that in the outer class. This is what happened in your Option 1 - the generator didn't recognise the reference as being in its target list because only the two outer files, not the schema at #/shared/SharedA, were in the target list.

The code generator was subsequently modified to accept a set of schema definitions contained in a composite file such as an OpenAPI file. In this case it creates target entries for each definition in the collection, so individual classes will be created for each schema definition, and references from one to another will be generated as references to the generated classes. It's possible to create a complex target list from multiple files when using the CodeGenerator class programmatically, but this is not covered by the Gradle plugin.

To generate a set of classes from individual files with inter-schema references, you need to ensure that each schema to be referenced in this way is a member of the target list, that is, it is the main object in the file. I changed your files to:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "foo": {
      "type": "string"
    }
  }
}

and:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "baz": {
      "type": "string"
    },
    "shared": {
      "$ref": "shared.json"
    }
  }
}

and the output was (omitting the comments):

package com.example.generated

data class Shared(
    val foo: String? = null
)

and:

package com.example.generated

data class A(
    val baz: String? = null,
    val shared: Shared? = null
)

The generated class names are derived from the "$id" in the file, or if that is not present, from the filename. There are options for specifying the generated class name in the config file, if the derived names are not satisfactory.

Thank you for persisting with the code generator - I hope this has given you enough information to configure your use of it successfully.

magnusrobertsson commented 2 years ago

Thank you for your swift reply. I've played around with your suggestions and it kind of works, but it feels a bit stiff to be honest. In the example I only had one definition in the shared file but I'd really like to have multiple definitions that I can refer to. I could have these in multiple files but it quickly becomes complex when working with 50+ definitions.

As you wrote, your CodeGenerator class is much more flexible. What I'm looking for is a way to express everything that you can do with one file for multiple files. Here is a pseudo example:

configure<JSONSchemaCodegen> {
    inputFiles {
        inputFile {
            file = file("src/main/resources/schema/shared.json")
            pointer = "/shared”
            packageName = "com.example.generated.shared"
        }
        inputFile {
            file = file("src/main/resources/schema/a.json")
            pointer = "/a”
            packageName = "com.example.generated.a"
        }
    }
    outputDir.set(file("build/generated"))
}

The package name may be a stretch but I hope you get my point. It would be nice to set context related parameters per file rather than having them on a global level, or wdyt?

pwall567 commented 2 years ago

I appreciate the feedback on the ways people wish to use the code generator. The usage you propose is reasonable, but it would be complicated with the current version, even using the CodeGenerator class directly.

I have therefore made some modifications to the parent project, adding greater flexibility to the way it builds its target list (described earlier). This will make it possible to extend the input specification options of the Gradle plugin, but that work is not done yet.

I can't promise that the configuration options will be as elegant as you suggest, but they should provide the functionality you describe. I hope to have something for you to try in a few days; a week at the most.

I can see the merit of allowing config to be specified individually for a file or group of files, but that would be moderately complicated to achieve. If it's just packageName you're concerned about, are you aware that when a directory is specified as the input, the code generator will walk the tree, adding the directory names to the packageName for each file in the directory? (This behaviour is controlled by the derivePackageFromStructure config file option.)

pwall567 commented 2 years ago

Can I also recommend that you use the JSON Schema standard name for the group of schema objects (shared in your very first example above). Prior to Draft 2019-09 (and I notice you are specifying Draft-07), the preferred name was definitions; from Draft 2019-09 onwards the preferred name is $defs. It's not essential, but it makes the purpose of the definitions clearer to a human reading your schema files.

pwall567 commented 2 years ago

A few minutes ago I uploaded version 0.70 of the Gradle plugin. The build.gradle.kts configuration block that this version uses is very close to what you suggested above:

configure<JSONSchemaCodegen> {
    inputs {
        inputComposite {
            file = file("src/main/resources/schema/composite.json")
            pointer = "/definitions”
        }
        inputFile {
            file = file("src/main/resources/schema/model") // this may be a file or a directory
        }
    }
}

Note the different functions for specifying a file (or directory) and a composite. Any number of either may be included in the inputs block. Context-specific config (like packageName) is not included.

The documentation has been updated to cover the new versions; I hope it is sufficient to help you make use of it.

Thank you for your feedback - it all helps to make a better system.

magnusrobertsson commented 2 years ago

Thank you for your responsiveness! I had a quick stab at it and got it to work... but only with absolute paths for my refs, i.e. rather than having "$ref": "shared.json#/shared/SharedA" I had to use "$ref": "file:///Users/.../src/main/schema/shared.json#/shared/SharedA". This is of course not very practical! How can I reference definitions in another composite?

It would be nice it was looking for files relative to the file we're parsing. Or we could use some sort of identifier per composite, e.g.:

configure<JSONSchemaCodegen> {
    packageName.set("com.example.generated")
    inputs {
        // This will set "shared-entities" as identifier so we can ref to definitions in this file by "shared-entities#/shared/SharedA"
        inputComposite(
            file("src/main/resources/schema/shared.json"),
            "/shared",
            "shared-entities"
        )
        inputComposite(
            file("src/main/resources/schema/a.json"),
            "/a"
        )
    }
    outputDir.set(file("build/generated"))
    pointer.set("/a”) // If this is specified together with inputs you get an error!
}

The default behaviour could be to use the filename (without path).

magnusrobertsson commented 2 years ago

Also, I stumbled upon a small bug. If you specify the pointer at the plugin level you get an error. For example:

configure<JSONSchemaCodegen> {
    packageName.set("com.example.generated")
    inputs {
        inputComposite(
            file("src/main/resources/schema/shared.json"),
            "/shared"
        )
        inputComposite(
            file("src/main/resources/schema/a.json"),
            "/a"
        )
    }
    outputDir.set(file("build/generated"))
    pointer.set("/a”) // If this is specified together with inputs you get an error!
}

Gives the following error:

...
Caused by: java.io.FileNotFoundException: src/main/resources/schema (Is a directory)
        at net.pwall.json.JSON.parse(JSON.java:193)
        at net.pwall.json.schema.parser.JSONReader.readJSON(JSONReader.kt:110)
        ... 121 more

It seems like setting pointer automatically sets the "old" inputFile parameter.

pwall567 commented 2 years ago

I think your problems may be solved by the use of the $id in your schema files.

The code generator will scan all the files to be processed and store the $id of each one, as well as the file: URL created from the File reference. When a $ref is encountered, the system tries to locate the referenced schema by either of those references.

For example, you could add to the shared schema:

  "$id": "http://example.com/schema/shared.json",

(the id is a URI, not a URL, and does not need to be an actual address).

Then, in the other file, the reference could be:

    "$ref": "http://example.com/schema/shared.json#/shared/SharedA"

The system will also resolve relative references, so if the files are in the same directory and have $id entries that correspond to their filenames, the reference could be:

    "$ref": "shared.json#/shared/SharedA"

I generated what look like correct Kotlin classes just by adding $id entries to both your example files, with the following configuration block:

configure<JSONSchemaCodegen> {
    inputs {
        inputComposite {
            file.set(file("src/main/resources/schema/shared.json"))
            pointer.set("/shared")
        }
        inputFile(file("src/main/resources/schema/a.json"))
    }
}

Strictly speaking, the $id doesn't have to be the same as the filename, but I find that IntelliJ will offer Ctrl-click linking to the referenced schema if the names are the same.

A pattern of usage that I find helpful is to have a directory structure of individual files for the major elements of my model, and a single composite file containing the utility or shared schema definitions. A configuration block similar to that shown above will generate all the classes for this structure, with the inputFile entry pointing to the head of the directory tree of the individual files. But your pattern of usage may be different, and I would like to think that the code generator will be useful in a wide variety of circumstances.

pwall567 commented 2 years ago

And that problem with the pointer is the result of leaving the old mechanism in place in parallel with the new one, to avoid causing problems for anyone using the old form. I can modify it to not look at the pointer if the old form of inputFile is not being used. But if you don't mind I'll leave that as a low-priority fix since it's easily avoided.

magnusrobertsson commented 2 years ago

You're a star! Of course, I had to use $id. It works perfectly as intended now. Now I can get rid of my ugly pre-process stage where I use jq to combine all my json schemas... Thank you again for this great piece of software!

pwall567 commented 2 years ago

I'm assuming from your comments that I can close this issue.

Feel free to reopen it or to open a new issue if you're still having problems.