zalando-incubator / spark-json-schema

JSON schema parser for Apache Spark
MIT License
81 stars 43 forks source link

getting error like java.lang.IllegalArgumentException: No <SchemaType> #35

Open vishnu2706 opened 7 years ago

vishnu2706 commented 7 years ago

code part: val path="C:\projects\test\Training_2\person.txt".trim
val fileContents = scala.io.Source.fromFile(path).getLines.mkString //Json.parse(fileContents)
val x= SchemaConverter.convertContent(fileContents)

Exception in thread "main" java.lang.IllegalArgumentException: No in schema at </> at org.zalando.spark.jsonschema.SchemaConverter$.getJsonType(SchemaConverter.scala:114) at org.zalando.spark.jsonschema.SchemaConverter$.convert(SchemaConverter.scala:67) at org.zalando.spark.jsonschema.SchemaConverter$.convertContent(SchemaConverter.scala:60)

input json is : { "$schema": "http://json-schema.org/draft-04/schema#", "definitions": { "tradeStatic": { "description": "Front Cache HDFS Trade Static Schema", "type": "object", "properties": { "tradeStatic": { "title": "tradeStatic", "type": "object", "properties": { "acquireDate": { "type": "string", "pattern": "dd-MM-yyyy", "format": "date-time" }, "acquirerName": { "type": "string" }, "acquirerNumber": { "type": "string" } }, "required": [ "acquirerName", "acquirerNumber" ], "additionalProperties": false } }, "required": [ "tradeStatic" ], "additionalProperties": false } } }

hesserp commented 7 years ago

Hi vishnuvvv,

okay, I think the error message could be formulated a bit more clear. No at </> means that there is no type given at root level ("/"). This type has to be "object". So your schema should start like

"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
...

At that moment you will get an error, that you have no "properties"-field on root-level. "definitions" is actually reserved for templates which can be used in your schema, but the actual content should be in the "properties" of root level. So replacing "definitions" by "properties" would be fine in your case giving you a schema like

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "type": "object",
    "properties": {
        ...
    }
}
vishnu2706 commented 7 years ago

Thanks .one more issue, in the final structtype fields all properties were marked "nullable as false" but based on my schema only acquired name and acquired number would be' nullable as false' ie acquired date can be nullable .but here every fields is coming as nullable false.

Please explain the scope of REQUIRED FIELD ATTRIBUTE in parser logic

hesserp commented 7 years ago

Sorry for the late reply, I somehow didn't notice your last comment when you made it.

"nullable" and "required" are two different things.

nullable describes if the value is allowed to be null. nullable is false by default and set to true if "type" is set to "[\<type>, null]".

required describes if a field has to be given in a datapoint. A value can be given in a datapoint, but still have the value null.

Anyway required and nullable is actually ignored currently. The required-field is not checked in the current version of spark-json-schema (pull requests are welcome ^^). The nullable attribute is used, but actually ignored by spark when really loading data using the schema. Thus, if this is important for you, we recommend the usage of an additional schema validator.