Support table truncate when writing in overwrite mode

jmyrberg commented 1 year ago

Is your feature request related to a problem? Please describe.

We have a table where all the rows need to be re-written after a Spark batch job. The table has existing permissions that should be preserved. However, the table permissions are lost when the rows are written in overwrite mode:

(sdf.write.format('com.vertica.spark.datasource.VerticaSource')
    .mode('overwrite').options(**options).save())

Describe the solution you'd like

In overwrite mode, there should be an option to truncate the target table instead of dropping it before re-write.

The standard Spark JDBC connector allows one to set truncate option to solve this (link).

Describe alternatives you've considered

Workarounds exist, but they are not very practical:

Execute truncate statement through JDBC using some other library and write the rows using the connector in append mode
Pass the privileges through target_table_sql parameter
Using some external script/procedure to re-set the privileges

Additional context

What the solution could look like after implementation:

options['truncate'] = True

(sdf.write.format('com.vertica.spark.datasource.VerticaSource')
    .mode('overwrite').options(**options).save())

alexey-temnikov commented 1 year ago

Hi @jmyrberg, thank you for raising this. I agree with your suggestion, we will look into this enhancement.

jeremyprime commented 1 year ago

@jmyrberg, the truncate option (see the documentation in README.md) has been added as part of release 3.3.4, which is now available in Maven.

jmyrberg commented 1 year ago

@jmyrberg, the truncate option (see the documentation in README.md) has been added as part of release 3.3.4, which is now available in Maven.

Awesome, we’ll have a look, thank you! 👍

vertica / spark-connector