uber / RemoteShuffleService

Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Other
321 stars 100 forks source link

Spark 3.1/3.2 failed sql skew and local reader tests #99

Open YutingWang98 opened 1 year ago

YutingWang98 commented 1 year ago

Hi, I ran the SparkSqlOptimizeSkewedJoinTest and SparkSqlOptimizeLocalShuffleReaderTest using spark3.1 and spark3.2, and both Rss test failed with assertion error with duplicate output rows.

For example, the expected output of SparkSqlOptimizeLocalShuffleReaderTest has 2 records 1 100, 1 101 however, the rss output has 8 records 1 100, 1 100, 1 100, 1 100, 1 101, 1 101, 1 101, 1 101

I also ran with spark 3.0, and the test passed. Wondering if you have any idea why there is such a issue with spark 3.1 and 3.2

hiboyang commented 1 year ago

Previously RSS was not tested much with Spark 3.1/3.2 and Adaptive Query Execution (AQE). The code looks having bug. Would love to see someone debug further there.

YutingWang98 commented 1 year ago

@hiboyang Hi, I fould the bug and fixed it in a pull request