palantir / spark

Palantir Distribution of Apache Spark
Apache License 2.0
67 stars 51 forks source link

[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 #756

Closed LorenzoMartini closed 3 years ago

LorenzoMartini commented 3 years ago

Original pr message

Hive 2.3.8 changes: HIVE-19662: Upgrade Avro to 1.8.2 HIVE-24324: Remove deprecated API usage from Avro HIVE-23980: Shade Guava from hive-exec in Hive 2.3 HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue HIVE-24512: Exclude calcite in packaging. HIVE-22708: Fix for HttpTransport to replace String.equals HIVE-24551: Hive should include transitive dependencies from calcite after shading it HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

Upgrade Avro and Parquet to latest version.

No.

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: https://github.com/apache/spark/pull/30517

Closes #30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang yumwang@ebay.com Signed-off-by: Dongjoon Hyun dhyun@apple.com

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 https://github.com/apache/spark/pull/30657 or https://github.com/apache/spark/commit/c87b0085c987edc8fb78bd82d451d142c741eba1

What changes were proposed in this pull request?

Bump Hive version to 2.3.8. From bump pr: Hive 2.3.8 changes: HIVE-19662: Upgrade Avro to 1.8.2 HIVE-24324: Remove deprecated API usage from Avro HIVE-23980: Shade Guava from hive-exec in Hive 2.3 HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue HIVE-24512: Exclude calcite in packaging. HIVE-22708: Fix for HttpTransport to replace String.equals HIVE-24551: Hive should include transitive dependencies from calcite after shading it HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

The cherry-pick is not 100% clean as there are some small differences in the upstream history. One commit we didn't pick up is https://github.com/apache/spark/commit/10b6466e91d2e954386c74bf6ab7d94f23dd6810 that instroduced a test tha we don't have and therefore you won't see the change here.

There was also some refactoring (https://github.com/apache/spark/commit/a127387a53e1a24e76de83c5a1858fcdbd38c3a2) with removal of references to avro 1.2 that added a small if/else codepath for avro version. We don't have that so I kept the code as-is.

The last difference is in the spark-deps-hadoop-hive files. We only had 1 reference to hive, while upstream had many. Upstream change includes changes in all those, while for us running locks doesn't change any of those.

Why are the changes needed?

We need this bump to make the Avro bump work

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests