palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

Palantir Spark Diff on v3.0.2 #737

Closed jdcasale closed 3 years ago

jdcasale commented 3 years ago

This PR will rebase palantir/spark on top of upstream 3.0.2. It has the Palantir essentials, a few changes not merged upstream, and a few cherry-picks that weren't released yet. Details on that below. General suggestion is to look at each commit in isolation, as it should represent an atomic change that can be reviewed independent of the other changes. This is a big diff, (sorry) but we don't really have a better way to manage this change.

This code has already been tested internally on f-s and downstream products.

What happens to our master

The diff you're looking at is comparing against a branch tracking upstream's v3.0.2 tag. If you approve, we'll apply this set of Palantir commits on top of v3.0.2, and then do a 3-way merge with master where we throw away master's changes but keep its commit history (like here).

Changes

Obviously we have Spark 3 now instead of being mostly Spark 2. While fixing merge conflicts we tried to change as little as we could. No tests got dropped and we're not expecting any behavioural changes.

However, we dropped some "logical items" listed in our FORK.md. That's either because they were taken upstream, or because we no longer need the change internally. Below is the annotated list with what we did with the respective item. Items not listed there were probably dropped.

Items listed in FORK.md

Differences with upstream

Additions

All these are kept:

Reverts

These were breaking changes we had reverted. Most of these were later undone upstream, so we no longer need to track reverts.

jdcasale commented 3 years ago

Will squash the last few docs commits together before merging (don't want to do it now because it breaks review comment links)

rshkv commented 3 years ago

Checked that everything we're expecting to be log-safe still is (using this list for reference).

rshkv commented 3 years ago

Cherry-picked some scary looking fixes from branch-3.0.

rshkv commented 3 years ago

~I merged our current master into this branch using --strategy=ours, so:~

(jc/pt-diff-o-v3.0.2) $ git merge master --strategy=ours

~I'd later just merge this branch into master to keep our old commit history but discard its changes.~

rshkv commented 3 years ago

Nevermind, the above introduced 50 participants and grew the commits from 20 to 1700+. I undid that. We can discuss about what happens to master after this later.

robert3005 commented 3 years ago

LGTM, the changes I had in mind are all there.

rshkv commented 3 years ago

Thank you, @robert3005

rshkv commented 3 years ago

We created a new branch called palantir-3.0.2 from this one. It's the new repo default branch and we released 3.0.2-palantir.0 from it.

Going to close this.