pwall567 / json-kotlin-schema-codegen

Code generation for JSON Schema (Draft 07)
MIT License
74 stars 12 forks source link

Schema parsing doesn't support whitespace inside keys #14

Open TheKeeperOfPie opened 1 year ago

TheKeeperOfPie commented 1 year ago

I came here after trying to parse the VGMDB.info API schema here, using pwall567/json-kotlin-gradle, but it fails because some of the definitions have spaces in their keys.

Specifically if you check at https://vgmdb.info/schema/artist.json under definitions.artistInfo, which contains a property called Album Votes.

I can't find an explicit reference in the latest JSON Schema version that keys must not contain whitespace. Only that they must be strings and properly referenced using "key": value. So I think this should be supported, although obviously the whitespace will have to be replaced with a _ or simply removed when converted to a Java reference. I'm also not sure if whatever JSON parser you're using would support reading the key with whitespace.

For reference, this is the entire meaningful stacktrace:

Caused by: java.lang.IllegalArgumentException: Illegal character in fragment at index 41: #/definitions/artistInfo/properties/Album Votes
        at net.pwall.json.schema.codegen.CodeGenerator.findCustomClass(CodeGenerator.kt:1098)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseProperty(CodeGenerator.kt:1113)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseProperties(CodeGenerator.kt:1053)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseObject(CodeGenerator.kt:832)
        at net.pwall.json.schema.codegen.CodeGenerator.findTargetClass(CodeGenerator.kt:1089)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseProperty(CodeGenerator.kt:1130)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseProperties(CodeGenerator.kt:1053)
        at net.pwall.json.schema.codegen.CodeGenerator.analyseObject(CodeGenerator.kt:832)
        at net.pwall.json.schema.codegen.CodeGenerator.processTargetCrossReferences(CodeGenerator.kt:569)
        at net.pwall.json.schema.codegen.CodeGenerator.generateAllTargets(CodeGenerator.kt:538)
        at net.pwall.json.kotlin.codegen.gradle.JSONSchemaCodegenTask.generate(JSONSchemaCodegenTask.kt:93)

It looks like it's caused by the fact that java.net.URI doesn't support spaces in the fragment, which makes sense. This code would have to do regular string comparison, although I didn't dive far enough into the specifics to understand how difficult that will be.

On a tangential note, the only reason I'm trying to do this is because there's no VGMDB Kotlin wrapper, and also no RAML->Kotlin generator, so if anyone has any suggestions on that front, then this issue wouldn't matter.

pwall567 commented 1 year ago

Hi @TheKeeperOfPie – sorry it's taken me so long to get back to you – I'm on vacation at the moment and I've only just got a moment to myself to look at this.

You're right, the JSON Schema spec allows whitespace in property names, and I thought I had allowed for this usage, but it seems my tests of this feature were not sufficiently rigourous. When I change the line that creates the URI fragment to use a function that automatically escapes spaces and special characters, the code works OK.

For Kotlin, the generator encloses names that would otherwise not be allowed (e.g. reserved words, or words containing spaces) in backticks. When I tested using the VGMDB file, I got:

    val `Album Votes`: String? = null,
    val Birthplace: Any? = null,
    val `Credited works`: List<String>? = null,
    val `Weighted album rating`: String? = null

For Java, it replaces non-allowable characters with underscore, and adds an underscore suffix to names that clash with reserved words.

There are other problems with the VGMDB file, however. The data type Names contains only patternProperties, and this is very difficult to represent in Kotlin. The approach taken by the code generator is to output an open class (I am considering switching this to an interface), allowing the implementor to use their own derived class. This may not be ideal for your usage, and you may wish to use the Custom Classes configuration to substitute your own class.

As I mentioned earlier, I am currently on vacation, and that means I will not be able to upload a version with the bug fix until next week. I hope you can hang on until then. Thank you for providing clear information to help me to track down the bug.

TheKeeperOfPie commented 1 year ago

I appreciate the response, although sorry to interrupt your vacation time. Encoding the fragment makes sense.

I ended up writing my own parser here.

In the Names patternProperties case (handled here) I just fell back to parsing it as a Map<String, String>, which is how it would be used in my app anyways. But I'm also not concerned with supporting every feature or ensuring validation like this project would be.

Your suggested approach sounds fine to me though. Again, no rush. Once the fix is in and I get the time, I'll replace my hacked together solution. Thanks.