zero-one-group / geni

A Clojure dataframe library that runs on Spark
Apache License 2.0
281 stars 28 forks source link

Can't create boolean columns of all `false` #326

Closed erp12 closed 3 years ago

erp12 commented 3 years ago

Info

Info Value
Operating System MacOS
Geni Version 0.3.8
JDK 1.8
Spark Version 3.0.2

Problem / Steps to reproduce

It seems like it is impossible to create a boolean column from all false values using records->dataset because they get recognized as null columns. Here is a failing tests.

(fact "should work for bool columns"
    (let [dataset (g/records->dataset
                    @tr/spark
                    [{:i 0 :s "A" :b false}
                     {:i 1 :s "B" :b false}
                     {:i 2 :s "C" :b false}])]
      (instance? Dataset dataset) => true
      (g/schema dataset) => (g/->schema {:i :long
                                         :s :string
                                         :b :bool})
      (g/collect-vals dataset) => [[0 "A" false]
                                   [1 "B" false]
                                   [2 "C" false]]))

and here is the output.

FAIL On records->dataset - should work for bool columns at (dataset_creation_test.clj:143)
Expected:
#<org.apache.spark.sql.types.StructType@2e83f3f5 StructType(StructField(i,LongType,true), StructField(s,StringType,true), StructField(b,BooleanType,true))>
Actual:
#<org.apache.spark.sql.types.StructType@67b8b180 StructType(StructField(i,LongType,true), StructField(s,StringType,true), StructField(b,NullType,true))>

FAIL On records->dataset - should work for bool columns at (dataset_creation_test.clj:146)
Expected:
[[0 "A" false] [1 "B" false] [2 "C" false]]
Actual:
([0 "A" nil] [1 "B" nil] [2 "C" nil])
Diffs: in [0 2] expected false, was nil
              in [1 2] expected false, was nil
              in [2 2] expected false, was nil

The same behavior applies to map->dataset and table->dataset. If any of the booleans are true, then the schema is understood correctly.

erp12 commented 3 years ago

I confirmed that #327 fixes this issue. Thanks!