This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20
stars
22
forks
source link
[ENHANCMENT] Support for loading multiple tables #557
Describe the issue in as much details as possible, so it is possible to reproduce it.
The Spark connector is instantiating only one parallel JDBC connection. When one table completes its data loading, it closes the connection. Since the JDBC connection is defined as singleton in the code, it prevents other connections for clerical tasks such as table/column definition check. In order to work with this configuration, the connector will need to be enhanced to handle multiple threads.
Steps to reproduce:
Here is what we understood about how customer's job run -
Use kafka to write data in aws s3 files.
Then customer's code is submitted to a spark shell to read these files from s3, perform a few transformations.
This transformed data is then written to vertica using vertica spark connector.
Customer's code is having the ability to run the load and transform for multiple tables. and they claim not facing above issue when they used the vertica's legacy spark connector (when they used vertica 9.1.x).
Expected behaviour:
In our tests, we found out below observations-
The customer never faces an issue if they run code for a single table.
The spark job fails if submitting the code for multiple tables
Actual behaviour:
Error message/stack trace:
Code sample or example on how to reproduce the issue:
Environment
Problem Description
Describe the issue in as much details as possible, so it is possible to reproduce it.
The Spark connector is instantiating only one parallel JDBC connection. When one table completes its data loading, it closes the connection. Since the JDBC connection is defined as singleton in the code, it prevents other connections for clerical tasks such as table/column definition check. In order to work with this configuration, the connector will need to be enhanced to handle multiple threads.
Steps to reproduce:
Here is what we understood about how customer's job run -
Customer's code is having the ability to run the load and transform for multiple tables. and they claim not facing above issue when they used the vertica's legacy spark connector (when they used vertica 9.1.x).
Expected behaviour:
In our tests, we found out below observations-
Spark Connector Logs