This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20
stars
23
forks
source link
[BUG] Error when trying to create table. Check 'target_table_sql' option for issues. #560
When run the sample python spark code to write DF into Vertica using S3, I got following errors;
Problem Description
ERROR VerticaBatchReader: Error when trying to create table. Check 'target_table_sql' option for issues.
py4j.protocol.Py4JJavaError: An error occurred while calling o59.save.
: com.vertica.spark.util.error.ConnectorException: Error when trying to create table. Check 'target_table_sql' option for issues.
at com.vertica.spark.util.error.ErrorHandling$.logAndThrowError(ErrorHandling.scala:78)
at com.vertica.spark.datasource.v2.VerticaBatchWrite.(VerticaDatasourceV2Write.scala:71)
at com.vertica.spark.datasource.v2.VerticaWriteBuilder.buildForBatch(VerticaDatasourceV2Write.scala:51)
at org.apache.spark.sql.connector.write.WriteBuilder$1.toBatch(WriteBuilder.java:44)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:332)
Note: I have tested S3 connection in this spark, also the Vertica connection. The Vertica table was created but data failed to load.
Source Code:
Environment
When run the sample python spark code to write DF into Vertica using S3, I got following errors;
Problem Description
Note: I have tested S3 connection in this spark, also the Vertica connection. The Vertica table was created but data failed to load. Source Code:
spark = SparkSession.builder.master("local[1]").appName("Vertica Connector Pyspark Example") .getOrCreate() hadoop_conf = spark._jsc.hadoopConfiguration() hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") hadoop_conf.set("fs.s3a.path.style.access", "true") hadoop_conf.set("fs.s3a.access.key", "XXX") hadoop_conf.set("fs.s3a.secret.key", "XXX") hadoop_conf.set("fs.s3a.endpoint", "https://host:4443")
cols = ["language", "users_count"] data = [("Java", 20000), ("Python", 100000), ("Scala", 3000)] df = spark.createDataFrame(data).toDF(*cols)
df.write.mode('overwrite').save( format="com.vertica.spark.datasource.VerticaSource", host=host_vertica, user=userid, password=password, db=db_name, staging_fs_url='s3a://bucket/path', table=table_name )