palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

Support for reading multiple sorted files per bucket #742

Closed rahij closed 3 years ago

rahij commented 3 years ago

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

Redoing #731 and #738 on the new branch. The upstream PR author has said that it is taking longer since they are planning to automatically detect if a parent operator can take advantage of the sort before creating the bucketed sorted RDD. We will revert this PR when either of these happen:

rshkv commented 3 years ago

For reference, this was the original PR: https://github.com/apache/spark/pull/29625