vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

Use Spark DataSourceV2 to handle Parquet files #455

Open Aryex opened 2 years ago

Aryex commented 2 years ago

Descriptions

The connector support Parquet files by reusing some of Spark's lower-level internal systems. This resulted in the connector having to copy over private codes, handle data partitioning, and overall longer codes to maintain.

With Spark 3.0.0 adding support for Parquet DataSourceV2, it could be re-used to handle Parquet files like how JSON was supported in #370. Note that we would still need to look into how writing would be handled.

This could potentially be looked into as part of #403.

Reason: This change will help reduce effort supporting future Spark versions.

alexey-temnikov commented 1 year ago

Lowing priority, as it is relevant only during Spark Upgrade (when spark APIs are changed)