palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

Spark3 diff on top of tag 3.0.1 #735

Closed jdcasale closed 3 years ago

jdcasale commented 3 years ago

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

This is our entire diff post-spark3-upgrade, chunked into logical components where each commit builds successfully. A diff this big is really hard to review, but reviewing the diff with current pt/master is even more intractable, so I believe the way to go here is to review each commit here individually.

The first commit (https://github.com/palantir/spark/commit/8f1e51e98736e460dab53981f83a9eb7bc84cda1) sets up our palantir-hadoop and palantir-parquet dependencies and has a lot of the eccentricities associated with getting spark to work with foundry. This could perhaps be broken up further, if you have any opinions on that please let me know.

The second commit (https://github.com/palantir/spark/commit/8f1e51e98736e460dab53981f83a9eb7bc84cda1) is basically https://github.com/palantir/spark/pull/381 and related updates. It's giant, but so was the original PR and I didn't see a great way to split it up further.

What changes were proposed in this pull request?

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.


This change is Reviewable

rshkv commented 3 years ago

Link to the survey we did on diff in our fork and what we're keeping and dropping: https://pl.ntr/1Ut

rshkv commented 3 years ago

Closing in favour of #737