vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

Support table truncate when writing in overwrite mode #507

Closed jmyrberg closed 1 year ago

jmyrberg commented 1 year ago

Is your feature request related to a problem? Please describe.

We have a table where all the rows need to be re-written after a Spark batch job. The table has existing permissions that should be preserved. However, the table permissions are lost when the rows are written in overwrite mode:

(sdf.write.format('com.vertica.spark.datasource.VerticaSource')
    .mode('overwrite').options(**options).save())

Describe the solution you'd like

In overwrite mode, there should be an option to truncate the target table instead of dropping it before re-write.

The standard Spark JDBC connector allows one to set truncate option to solve this (link).

Describe alternatives you've considered

Workarounds exist, but they are not very practical:

Additional context

What the solution could look like after implementation:

options['truncate'] = True

(sdf.write.format('com.vertica.spark.datasource.VerticaSource')
    .mode('overwrite').options(**options).save())
alexey-temnikov commented 1 year ago

Hi @jmyrberg, thank you for raising this. I agree with your suggestion, we will look into this enhancement.

jeremyprime commented 1 year ago

@jmyrberg, the truncate option (see the documentation in README.md) has been added as part of release 3.3.4, which is now available in Maven.

jmyrberg commented 1 year ago

@jmyrberg, the truncate option (see the documentation in README.md) has been added as part of release 3.3.4, which is now available in Maven.

Awesome, we’ll have a look, thank you! 👍