zero-one-group / geni

A Clojure dataframe library that runs on Spark
Apache License 2.0
284 stars 28 forks source link

`->schema` and `create-dataframe` should support fields of struct array #333

Closed gavinkflam closed 3 years ago

gavinkflam commented 3 years ago

Info

Geni Version: 0.0.38

Problem / Steps to reproduce

user=> (require '[zero-one.geni.core.dataset-creation :as g] :reload)
nil
user=> (g/->schema {:coords [{:x :int :y :int}]})
Execution error (IllegalArgumentException) at org.apache.spark.sql.types.DataTypes/createArrayType (DataTypes.java:114).
elementType should not be null.

Expected results

user=> (g/->schema {:coords [{:x :int :y :int}]})
#object[org.apache.spark.sql.types.StructType 0x5cb6297e "StructType(StructField(coords,ArrayType(StructType(StructField(x,IntegerType,true), StructField(y,IntegerType,true)),true),true))"]

Proposed solution

At the moment, array-type supports only simple val-type listed in data-type->spark-type. E.g. :bool, :string.

We can extend array-type to support any Spark SQL DataType, in the same fashion we are already doing in struct-field.

gavinkflam commented 3 years ago

Please help reviewing pull request #334

make ci is passing on my machine