vertica / spark-connector

This component acts as a bridge between Spark and Vertica, allowing the user to either retrieve data from Vertica for processing in Spark, or store processed data from Spark into Vertica.
Apache License 2.0
20 stars 23 forks source link

[BUG] Error when trying to create table. Check 'target_table_sql' option for issues. #560

Open hyh1618 opened 3 weeks ago

hyh1618 commented 3 weeks ago

Environment

When run the sample python spark code to write DF into Vertica using S3, I got following errors;

Problem Description

Note: I have tested S3 connection in this spark, also the Vertica connection. The Vertica table was created but data failed to load. Source Code:

spark = SparkSession.builder.master("local[1]").appName("Vertica Connector Pyspark Example") .getOrCreate() hadoop_conf = spark._jsc.hadoopConfiguration() hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") hadoop_conf.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider") hadoop_conf.set("fs.s3a.path.style.access", "true") hadoop_conf.set("fs.s3a.access.key", "XXX") hadoop_conf.set("fs.s3a.secret.key", "XXX") hadoop_conf.set("fs.s3a.endpoint", "https://host:4443")

cols = ["language", "users_count"] data = [("Java", 20000), ("Python", 100000), ("Scala", 3000)] df = spark.createDataFrame(data).toDF(*cols)

df.write.mode('overwrite').save( format="com.vertica.spark.datasource.VerticaSource", host=host_vertica, user=userid, password=password, db=db_name, staging_fs_url='s3a://bucket/path', table=table_name )