tripl-ai / questions

Forum for asking questions relating to Arc which are not defects.
https://arc.tripl.ai
0 stars 2 forks source link

Setting configuration parameters. #9

Open MeriaJ opened 4 years ago

MeriaJ commented 4 years ago

Hi,

How can I set the below spark option with Arc?

Driver for JDBC read => spark.read.option("driver", "some driver").jdbc(... Overwrite mode to overwrite at partition level - spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

Thanks.

seddonm1 commented 4 years ago

Hi,

  1. you shouldn't need to set a driver as it should be dynamically resolved based on the JDBC connection string.

  2. When you submit the job you can set this option like:

docker run \
--rm \
-v $(pwd)/examples:/home/jovyan/examples:Z \
-e "ETL_CONF_ENV=production" \
-p 4040:4040 \
triplai/arc:arc_2.7.0_spark_2.4.4_scala_2.12_hadoop_2.9.2_1.0.0 \
bin/spark-submit \
--master local[*] \
--driver-memory 4g \
--driver-java-options "-XX:+UseG1GC -XX:-UseGCOverheadLimit -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap" \
--conf spark.sql.sources.partitionOverwriteMode dynamic \
--class ai.tripl.arc.ARC \
/opt/spark/jars/arc.jar \
--etl.config.uri=file:///home/jovyan/examples/tutorial/0/nyctaxi.ipynb
MeriaJ commented 4 years ago

@seddonm1 Thanks. The JDBC driver part did work with the triplai docker image. I was trying to run the arc jar in EMR via spark-submit and it failed since it couldn't resolve the driver. This is what I tried.

spark-submit --packages "com.microsoft.sqlserver:mssql-jdbc:7.4.1.jre8" --class ai.tripl.arc.ARC s3://bucket/arc-assembly-2.7.0.jar --etl.config.uri=s3a://filepath

seddonm1 commented 4 years ago

@MeriaJ The problem is that the EMR instance must not have the mssql jdbc jar available where we build it into the ARC dockerfile: https://github.com/tripl-ai/docker/blob/master/arc/Dockerfile_2.11#L89

It looks like you can put that JAR into the EMR instance manually like this answer: https://stackoverflow.com/questions/44793739/connect-amazon-emr-spark-with-mysql-writing-data