mrpowers-io / spark-fast-tests

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
https://mrpowers-io.github.io/spark-fast-tests/
MIT License
430 stars 77 forks source link

bug: ignoreNullable doesn't work for nested StructTypes #96

Closed mlavengood-sayari closed 2 years ago

mlavengood-sayari commented 3 years ago

For test case:

  test("test dataFrameComparer") {
    val df1 = spark.createDataFrame(
      spark.sparkContext.emptyRDD[Row],
      StructType(
        List(
          StructField("nested_struct",
            StructType(
              List(
                StructField("field", StringType, true),
              )
            )
          )
        )
      )
    )

    val df2 = spark.createDataFrame(
      spark.sparkContext.emptyRDD[Row],
      StructType(
        List(
          StructField("nested_struct",
            StructType(
              List(
                StructField("field", StringType, false),
              )
            )
          )
        )
      )
    )

    assertSmallDataFrameEquality(df1, df2, ignoreNullable=true)
}

Expected behavior with ignoreNullable = true should be a passed test. Instead, the test fails. This only seems to affect nested fields in StructType columns.

Dressingoak commented 3 years ago

We have had this problem as well, and it makes it cumbersome to always set the right types on indented fields. I think #97 will fix this. It especially became too complex to overcome since some of the subtypes of calculated DataFrames changed when we went from Spark 2 to Spark 3.

mlavengood-sayari commented 2 years ago

@MrPowers Looks to be working for me, thanks!