zalando-incubator / spark-json-schema

JSON schema parser for Apache Spark
MIT License
81 stars 43 forks source link

Improve arrays' typing #24

Closed AcidFlow closed 7 years ago

AcidFlow commented 7 years ago

Hello :)

In a JSON schema, an array type can declare a "type" property in its tag. When parsing the schema, this type has to be respected and therefore an array of a given type should not necessarily result in an array of StructType.

A bit of context

I recently started using this schema converter to be able to convert some JSON files to Parquet. My JSON were following their respective jsonschema but at one point I had to store an array of strings. When I tried to do so, I encountered some exception because parquet was not able to store an empty group. When I looked at the schema, I found out that my string array were in reality some object arrays. When I created the schema manually setting the node type to an ArrayType(StringType) everything worked smoothly. So here is a small contribution to fix this issue.

codecov-io commented 7 years ago

Codecov Report

Merging #24 into master will increase coverage by 3.54%. The diff coverage is 95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #24      +/-   ##
==========================================
+ Coverage   89.23%   92.77%   +3.54%     
==========================================
  Files           1        1              
  Lines          65       83      +18     
  Branches        1        1              
==========================================
+ Hits           58       77      +19     
+ Misses          7        6       -1
Impacted Files Coverage Δ
...org/zalando/spark/jsonschema/SchemaConverter.scala 92.77% <95%> (+3.54%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dd1565d...0f1934d. Read the comment docs.

AcidFlow commented 7 years ago

Thanks for your comments, I will fix them tomorrow :)

hesserp commented 7 years ago

:+1:

pabair commented 7 years ago

👍