Closed MTelling closed 3 years ago
Correct the logic to compute the distinct.
Below is a small repro snippet.
scala> val df = Seq(Seq(Seq(1, 2), Seq(1, 2), Seq(1, 2), Seq(3, 4), Seq(4, 5))).toDF("array_col") df: org.apache.spark.sql.DataFrame = [array_col: array<array<int>>] scala> val distinctDF = df.select(array_distinct(col("array_col"))) distinctDF: org.apache.spark.sql.DataFrame = [array_distinct(array_col): array<array<int>>] scala> df.show(false) +----------------------------------------+ |array_col | +----------------------------------------+ |[[1, 2], [1, 2], [1, 2], [3, 4], [4, 5]]| +----------------------------------------+
Error
scala> distinctDF.show(false) +-------------------------+ |array_distinct(array_col)| +-------------------------+ |[[1, 2], [1, 2], [1, 2]] | +-------------------------+
Expected result
scala> distinctDF.show(false) +-------------------------+ |array_distinct(array_col)| +-------------------------+ |[[1, 2], [3, 4], [4, 5]] | +-------------------------+
Added an additional test.
Closes #24073 from dilipbiswal/SPARK-27134.
Authored-by: Dilip Biswal dbiswal@us.ibm.com Signed-off-by: Sean Owen sean.owen@databricks.com (cherry picked from commit aea9a574c44768d1d93ee7e8069729383859292c) Signed-off-by: Sean Owen sean.owen@databricks.com
https://issues.apache.org/jira/browse/SPARK-27134
Fixing bug with array_distinct
Existing tests
Please review http://spark.apache.org/contributing.html before opening a pull request.
What changes were proposed in this pull request?
Correct the logic to compute the distinct.
Below is a small repro snippet.
Error
Expected result
How was this patch tested?
Added an additional test.
Closes #24073 from dilipbiswal/SPARK-27134.
Authored-by: Dilip Biswal dbiswal@us.ibm.com Signed-off-by: Sean Owen sean.owen@databricks.com (cherry picked from commit aea9a574c44768d1d93ee7e8069729383859292c) Signed-off-by: Sean Owen sean.owen@databricks.com
Upstream SPARK-27134 ticket and PR link (if not applicable, explain)
https://issues.apache.org/jira/browse/SPARK-27134
What changes were proposed in this pull request?
Fixing bug with array_distinct
How was this patch tested?
Existing tests
Please review http://spark.apache.org/contributing.html before opening a pull request.