palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 #754

Closed LorenzoMartini closed 3 years ago

LorenzoMartini commented 3 years ago

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

https://github.com/apache/spark/pull/30657 or https://github.com/apache/spark/commit/c87b0085c987edc8fb78bd82d451d142c741eba1

What changes were proposed in this pull request?

Bump Hive version to 2.3.8. From bump pr: Hive 2.3.8 changes: HIVE-19662: Upgrade Avro to 1.8.2 HIVE-24324: Remove deprecated API usage from Avro HIVE-23980: Shade Guava from hive-exec in Hive 2.3 HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue HIVE-24512: Exclude calcite in packaging. HIVE-22708: Fix for HttpTransport to replace String.equals HIVE-24551: Hive should include transitive dependencies from calcite after shading it HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

The cherry-pick is not 100% clean as there are some small differences in the upstream history. One commit we didn't pick up is https://github.com/apache/spark/commit/10b6466e91d2e954386c74bf6ab7d94f23dd6810 that instroduced a test tha we don't have and therefore you won't see the change here.

There was also some refactoring (https://github.com/apache/spark/commit/a127387a53e1a24e76de83c5a1858fcdbd38c3a2) with removal of references to avro 1.2 that added a small if/else codepath for avro version. We don't have that so I kept the code as-is.

The last difference is in the spark-deps-hadoop-hive files. We only had 1 reference to hive, while upstream had many. I included all those from upstream.

Why are the changes needed?

We need this bump to make the Avro bump work

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests