vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

Support Spark 3.4.0 #549

Open jeremyprime opened 1 year ago

jeremyprime commented 1 year ago

Is your feature request related to a problem? Please describe.

Spark 3.4.0 was released in April. This version is now trickling into Bitnami and other Spark images, so we need to support it soon.

Describe the solution you'd like

The daily tests are already failing with the latest Bitnami image using Spark 3.4.0 (see #547). We need to investigate what has changed and make code changes to be compatible with 3.4.0, while maintaining compatibility with previous Spark versions.

jeremyprime commented 1 year ago

The best way to test what changed in Spark 3.4.0 would be to update docker-compose.yml to use the latest Bitnami Spark image, and recreate the dev Docker environment. Then run the unit tests and functional tests locally (and the examples for completeness).

We know that at a minimum PartitionedFile.filePath changed: https://javadoc.io/static/org.apache.spark/spark-sql_2.12/3.3.2/org/apache/spark/sql/execution/datasources/PartitionedFile.html https://javadoc.io/static/org.apache.spark/spark-sql_2.12/3.4.0/org/apache/spark/sql/execution/datasources/PartitionedFile.html

jeremyprime commented 1 year ago

When Spark 3.4.0 is supported, the test matrices and README under .github/workflows should be updated. And the default version in docker-compose.yml to use the latest supported version. As well, the supported versions advertised by the badge and elsewhere in README.md and other docs should be updated.