Closed darylerwin closed 6 years ago
Able to run this via spark-shell on the master node:
spark-shell --packages com.spotify:spark-bigquery_2.11:0.2.1
Ivy Default Cache set to: /home/derwin/.ivy2/cache
The jars for the packages stored in: /home/derwin/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.spotify#spark-bigquery_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.spotify#spark-bigquery_2.11;0.2.1 in central
found com.databricks#spark-avro_2.11;3.0.0 in central
found org.slf4j#slf4j-api;1.7.5 in central
found org.apache.avro#avro;1.7.6 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
found com.thoughtworks.paranamer#paranamer;2.3 in central
found org.xerial.snappy#snappy-java;1.0.5 in central
found org.apache.commons#commons-compress;1.4.1 in central
found org.tukaani#xz;1.0 in central
found com.google.cloud.bigdataoss#bigquery-connector;0.7.5-hadoop2 in central
found com.google.cloud.bigdataoss#util-hadoop;1.4.5-hadoop2 in central
found com.google.api-client#google-api-client-java6;1.20.0 in central
found com.google.api-client#google-api-client;1.20.0 in central
found com.google.oauth-client#google-oauth-client;1.20.0 in central
found com.google.http-client#google-http-client;1.20.0 in central
found com.google.code.findbugs#jsr305;2.0.3 in central
found org.apache.httpcomponents#httpclient;4.0.1 in central
found org.apache.httpcomponents#httpcore;4.0.1 in central
found commons-logging#commons-logging;1.1.1 in central
found commons-codec#commons-codec;1.6 in central
found com.google.http-client#google-http-client-jackson2;1.20.0 in central
found com.fasterxml.jackson.core#jackson-core;2.1.3 in central
found com.google.oauth-client#google-oauth-client-java6;1.20.0 in central
found com.google.api-client#google-api-client-jackson2;1.20.0 in central
found com.google.apis#google-api-services-storage;v1-rev35-1.20.0 in central
found com.google.guava#guava;18.0 in central
found com.google.cloud.bigdataoss#util;1.4.5 in central
found com.google.cloud.bigdataoss#gcs-connector;1.4.5-hadoop2 in central
found com.google.cloud.bigdataoss#gcsio;1.4.5 in central
found com.google.apis#google-api-services-bigquery;v2-rev217-1.20.0 in central
found com.google.code.gson#gson;2.3 in central
found org.apache.avro#avro;1.7.7 in central
found org.slf4j#slf4j-simple;1.7.21 in central
found org.slf4j#slf4j-api;1.7.21 in central
found joda-time#joda-time;2.9.3 in central
Is there any way for me to see what dataproc is using for the libraries ? or should I somehow code this in the build.sbt
I am new to this architecture and have read many articles on various dependencies and such not working. Can someone point out where I might have gone wrong? Lots of trial and error in this build.sbt Spark 2.2.0 Scala 2.11.8
build.sbt:
The Error:
Also using the init code when Dataproc builds the cluster to replace the avro files.
Sample script attempting to run.. [some cutting and pasting here] I have tried both the direct Table call and the bigQuerySelect call. The Save DOES work..