palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

Arrow 2.0 and PyArrow 1.0.1 #746

Closed rshkv closed 3 years ago

rshkv commented 3 years ago

Exact cherry picks of upstream commits that bump pyarrow to 1.0.1 and Arrow to 2.0.

As result of this, the Java-side Arrow version Spark depends on is 2.0.0. Python-side the minimum installed PyArrow version must be 1.0.1.

The major version difference is fine. Arrow has a versioning scheme that separates the format from the clients. So the Python and Java sides have different major versions for their clients but they both use Arrow format 1.x (see comment here).

Only difference with upstream commits is that the 1.0.1 bump modified docs which we don't have on our master branch yet (diff). I just dropped that change.

rshkv commented 3 years ago

This won't work I'm afraid. PyArrow 1.0 doesn't support Python 2. We do while we're still tracking Spark 3.0.