Closed nchammas closed 7 years ago
Looks good to me 👍
Hello,
Sounds great - will take a look on the weekend!
Alex
does this also support loading application jar from s3 ?
I have just tried launching job from application jar from s3 using flintrock from branch easy-s3-access , i have got following exception
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
Only way i got this working by adding jar to SPARK_CLASSPATH, https://github.com/pragnesh/flintrock/commit/48c0e16ba0bcf558c707bec51a65f622c7ab4402
i know SPARK_CLASSPATH is deprecated since Spark 1.0, but this is the only way it works.
@pragnesh - I'm not familiar with launching jobs in the way you describe. Can you share an example of the command that triggers the error you reported?
@nchammas for example i have uploaded jar spark-examples_2.11-2.1.0.jar to "example_bucket" on s3, if tried to launch job org.apache.spark.examples.SparkPi using following command then I got "Class org.apache.hadoop.fs.s3a.S3AFileSystem not found" exception
spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master spark://`hostname`:6066 \
s3a://example_bucket/spark-examples_2.11-2.1.0.jar
This command only work if i add s3 related jar to SPARK_CLASSPATH, This scenario supported with emr.
Hey guys,
Glad to see easier S3 access being implemented.
I remembered that I got some pretty unintelligible errors when trying to access S3 files with Spark 2.0.1 on Hadoop 2.7.2, 2.6, 2.5. Had to revert to 2.4 and a custom version of Spark built against 2.4. Referenced this issue SPARK-7442.
This is detailed in my guide under Using Flintrock
It seems that the above will work with Spark 2.0, 2.1 on Hadoop 2.7.3. I will try it when possible,
Danny
Side note: Reading zipped csv files from S3 didn't work for me if anyone has any luck with that
@pragnesh - If you try this command with Spark 2.1 and Hadoop 2.7, does it work for you?
spark-submit \
--packages org.apache.hadoop:hadoop-aws:2.7.3
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master spark://hostname:6066 \
s3a://example_bucket/spark-examples_2.11-2.1.0.jar
Hey @PiercingDan - Yup, this PR is supposed to address precisely the problem you documented in your guide, and which I also reported in SPARK-7442. 👍
Side note: Reading zipped csv files from S3 didn't work for me if anyone has any luck with that
Did this work using the approach you documented in your guide but not with this PR? Or are you reporting a separate issue? If it's a separate issue that you think may be related to Flintrock, I suggest opening a new issue. I'll take a look.
@nchammas I followed the method in my guide. I suspect the issue is unrelated to Flintrock, it is most likely spark-csv related.
@nchammas I have just tried the way you ask me to test it is still failing with same exception,
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
Here is spark-submit log,
Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
:: loading settings :: url = jar:file:/home/ec2-user/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.apache.hadoop#hadoop-aws;2.7.3 in central
found org.apache.hadoop#hadoop-common;2.7.3 in central
found org.apache.hadoop#hadoop-annotations;2.7.3 in central
found com.google.guava#guava;11.0.2 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found commons-cli#commons-cli;1.2 in central
found org.apache.commons#commons-math3;3.1.1 in central
found xmlenc#xmlenc;0.52 in central
found commons-httpclient#commons-httpclient;3.1 in central
found commons-logging#commons-logging;1.1.3 in central
found commons-codec#commons-codec;1.4 in central
found commons-io#commons-io;2.4 in central
found commons-net#commons-net;3.1 in central
found commons-collections#commons-collections;3.2.2 in central
found javax.servlet#servlet-api;2.5 in central
found org.mortbay.jetty#jetty;6.1.26 in central
found org.mortbay.jetty#jetty-util;6.1.26 in central
found com.sun.jersey#jersey-core;1.9 in central
found com.sun.jersey#jersey-json;1.9 in central
found org.codehaus.jettison#jettison;1.1 in central
found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
found javax.xml.bind#jaxb-api;2.2.2 in central
found javax.xml.stream#stax-api;1.0-2 in central
found javax.activation#activation;1.1 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
found org.codehaus.jackson#jackson-xc;1.9.13 in central
found com.sun.jersey#jersey-server;1.9 in central
found asm#asm;3.2 in central
found log4j#log4j;1.2.17 in central
found net.java.dev.jets3t#jets3t;0.9.0 in central
found org.apache.httpcomponents#httpclient;4.2.5 in central
found org.apache.httpcomponents#httpcore;4.2.5 in central
found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
found commons-lang#commons-lang;2.6 in central
found commons-configuration#commons-configuration;1.6 in central
found commons-digester#commons-digester;1.8 in central
found commons-beanutils#commons-beanutils;1.7.0 in central
found commons-beanutils#commons-beanutils-core;1.8.0 in central
found org.slf4j#slf4j-api;1.7.10 in central
found org.apache.avro#avro;1.7.4 in central
found com.thoughtworks.paranamer#paranamer;2.3 in central
found org.xerial.snappy#snappy-java;1.0.4.1 in central
found org.apache.commons#commons-compress;1.4.1 in central
found org.tukaani#xz;1.0 in central
found com.google.protobuf#protobuf-java;2.5.0 in central
found com.google.code.gson#gson;2.2.4 in central
found org.apache.hadoop#hadoop-auth;2.7.3 in central
found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
found org.apache.directory.api#api-util;1.0.0-M20 in central
found org.apache.zookeeper#zookeeper;3.4.6 in central
found org.slf4j#slf4j-log4j12;1.7.10 in central
found io.netty#netty;3.6.2.Final in central
found org.apache.curator#curator-framework;2.7.1 in central
found org.apache.curator#curator-client;2.7.1 in central
found com.jcraft#jsch;0.1.42 in central
found org.apache.curator#curator-recipes;2.7.1 in central
found org.apache.htrace#htrace-core;3.1.0-incubating in central
found javax.servlet.jsp#jsp-api;2.1 in central
found jline#jline;0.9.94 in central
found junit#junit;4.11 in central
found org.hamcrest#hamcrest-core;1.3 in central
found com.fasterxml.jackson.core#jackson-databind;2.2.3 in central
found com.fasterxml.jackson.core#jackson-annotations;2.2.3 in central
found com.fasterxml.jackson.core#jackson-core;2.2.3 in central
found com.amazonaws#aws-java-sdk;1.7.4 in central
found joda-time#joda-time;2.9.7 in central
[2.9.7] joda-time#joda-time;[2.2,)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar ...
[SUCCESSFUL ] org.apache.hadoop#hadoop-aws;2.7.3!hadoop-aws.jar (36ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.3/hadoop-common-2.7.3.jar ...
[SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.7.3!hadoop-common.jar (280ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.2.3/jackson-databind-2.2.3.jar ...
[SUCCESSFUL ] com.fasterxml.jackson.core#jackson-databind;2.2.3!jackson-databind.jar (45ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.2.3/jackson-annotations-2.2.3.jar ...
[SUCCESSFUL ] com.fasterxml.jackson.core#jackson-annotations;2.2.3!jackson-annotations.jar (12ms)
downloading https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar ...
[SUCCESSFUL ] com.amazonaws#aws-java-sdk;1.7.4!aws-java-sdk.jar (328ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-annotations/2.7.3/hadoop-annotations-2.7.3.jar ...
[SUCCESSFUL ] org.apache.hadoop#hadoop-annotations;2.7.3!hadoop-annotations.jar (11ms)
downloading https://repo1.maven.org/maven2/com/google/guava/guava/11.0.2/guava-11.0.2.jar ...
[SUCCESSFUL ] com.google.guava#guava;11.0.2!guava.jar (69ms)
downloading https://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar ...
[SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (12ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar ...
[SUCCESSFUL ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (53ms)
downloading https://repo1.maven.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar ...
[SUCCESSFUL ] xmlenc#xmlenc;0.52!xmlenc.jar (10ms)
downloading https://repo1.maven.org/maven2/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar ...
[SUCCESSFUL ] commons-httpclient#commons-httpclient;3.1!commons-httpclient.jar (18ms)
downloading https://repo1.maven.org/maven2/commons-codec/commons-codec/1.4/commons-codec-1.4.jar ...
[SUCCESSFUL ] commons-codec#commons-codec;1.4!commons-codec.jar (12ms)
downloading https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar ...
[SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar (14ms)
downloading https://repo1.maven.org/maven2/commons-net/commons-net/3.1/commons-net-3.1.jar ...
[SUCCESSFUL ] commons-net#commons-net;3.1!commons-net.jar (17ms)
downloading https://repo1.maven.org/maven2/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar ...
[SUCCESSFUL ] commons-collections#commons-collections;3.2.2!commons-collections.jar (24ms)
downloading https://repo1.maven.org/maven2/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar ...
[SUCCESSFUL ] javax.servlet#servlet-api;2.5!servlet-api.jar (12ms)
downloading https://repo1.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar ...
[SUCCESSFUL ] org.mortbay.jetty#jetty;6.1.26!jetty.jar (24ms)
downloading https://repo1.maven.org/maven2/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar ...
[SUCCESSFUL ] org.mortbay.jetty#jetty-util;6.1.26!jetty-util.jar (14ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar ...
[SUCCESSFUL ] com.sun.jersey#jersey-core;1.9!jersey-core.jar(bundle) (20ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar ...
[SUCCESSFUL ] com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (13ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar ...
[SUCCESSFUL ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (32ms)
downloading https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar ...
[SUCCESSFUL ] commons-logging#commons-logging;1.1.3!commons-logging.jar (12ms)
downloading https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar ...
[SUCCESSFUL ] log4j#log4j;1.2.17!log4j.jar(bundle) (23ms)
downloading https://repo1.maven.org/maven2/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar ...
[SUCCESSFUL ] net.java.dev.jets3t#jets3t;0.9.0!jets3t.jar (22ms)
downloading https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar ...
[SUCCESSFUL ] commons-lang#commons-lang;2.6!commons-lang.jar (17ms)
downloading https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar ...
[SUCCESSFUL ] commons-configuration#commons-configuration;1.6!commons-configuration.jar (16ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar ...
[SUCCESSFUL ] org.slf4j#slf4j-api;1.7.10!slf4j-api.jar (10ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar ...
[SUCCESSFUL ] org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar (15ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar ...
[SUCCESSFUL ] org.codehaus.jackson#jackson-mapper-asl;1.9.13!jackson-mapper-asl.jar (29ms)
downloading https://repo1.maven.org/maven2/org/apache/avro/avro/1.7.4/avro-1.7.4.jar ...
[SUCCESSFUL ] org.apache.avro#avro;1.7.4!avro.jar (17ms)
downloading https://repo1.maven.org/maven2/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar ...
[SUCCESSFUL ] com.google.protobuf#protobuf-java;2.5.0!protobuf-java.jar(bundle) (32ms)
downloading https://repo1.maven.org/maven2/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar ...
[SUCCESSFUL ] com.google.code.gson#gson;2.2.4!gson.jar (18ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.7.3/hadoop-auth-2.7.3.jar ...
[SUCCESSFUL ] org.apache.hadoop#hadoop-auth;2.7.3!hadoop-auth.jar (15ms)
downloading https://repo1.maven.org/maven2/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar ...
[SUCCESSFUL ] com.jcraft#jsch;0.1.42!jsch.jar (18ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-client/2.7.1/curator-client-2.7.1.jar ...
[SUCCESSFUL ] org.apache.curator#curator-client;2.7.1!curator-client.jar(bundle) (13ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-recipes/2.7.1/curator-recipes-2.7.1.jar ...
[SUCCESSFUL ] org.apache.curator#curator-recipes;2.7.1!curator-recipes.jar(bundle) (21ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar ...
[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar (11ms)
downloading https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/3.1.0-incubating/htrace-core-3.1.0-incubating.jar ...
[SUCCESSFUL ] org.apache.htrace#htrace-core;3.1.0-incubating!htrace-core.jar (54ms)
downloading https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar ...
[SUCCESSFUL ] org.apache.zookeeper#zookeeper;3.4.6!zookeeper.jar (28ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar ...
[SUCCESSFUL ] org.apache.commons#commons-compress;1.4.1!commons-compress.jar (16ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar ...
[SUCCESSFUL ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar ...
[SUCCESSFUL ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (31ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar ...
[SUCCESSFUL ] org.codehaus.jackson#jackson-jaxrs;1.9.13!jackson-jaxrs.jar (10ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar ...
[SUCCESSFUL ] org.codehaus.jackson#jackson-xc;1.9.13!jackson-xc.jar (10ms)
downloading https://repo1.maven.org/maven2/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar ...
[SUCCESSFUL ] javax.xml.bind#jaxb-api;2.2.2!jaxb-api.jar (11ms)
downloading https://repo1.maven.org/maven2/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar ...
[SUCCESSFUL ] javax.xml.stream#stax-api;1.0-2!stax-api.jar (10ms)
downloading https://repo1.maven.org/maven2/javax/activation/activation/1.1/activation-1.1.jar ...
[SUCCESSFUL ] javax.activation#activation;1.1!activation.jar (13ms)
downloading https://repo1.maven.org/maven2/asm/asm/3.2/asm-3.2.jar ...
[SUCCESSFUL ] asm#asm;3.2!asm.jar (10ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar ...
[SUCCESSFUL ] org.apache.httpcomponents#httpclient;4.2.5!httpclient.jar (20ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.2.5/httpcore-4.2.5.jar ...
[SUCCESSFUL ] org.apache.httpcomponents#httpcore;4.2.5!httpcore.jar (16ms)
downloading https://repo1.maven.org/maven2/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar ...
[SUCCESSFUL ] com.jamesmurty.utils#java-xmlbuilder;0.4!java-xmlbuilder.jar (9ms)
downloading https://repo1.maven.org/maven2/commons-digester/commons-digester/1.8/commons-digester-1.8.jar ...
[SUCCESSFUL ] commons-digester#commons-digester;1.8!commons-digester.jar (13ms)
downloading https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar ...
[SUCCESSFUL ] commons-beanutils#commons-beanutils-core;1.8.0!commons-beanutils-core.jar (14ms)
downloading https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar ...
[SUCCESSFUL ] commons-beanutils#commons-beanutils;1.7.0!commons-beanutils.jar (13ms)
downloading https://repo1.maven.org/maven2/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar ...
[SUCCESSFUL ] com.thoughtworks.paranamer#paranamer;2.3!paranamer.jar (10ms)
downloading https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar ...
[SUCCESSFUL ] org.xerial.snappy#snappy-java;1.0.4.1!snappy-java.jar(bundle) (34ms)
downloading https://repo1.maven.org/maven2/org/tukaani/xz/1.0/xz-1.0.jar ...
[SUCCESSFUL ] org.tukaani#xz;1.0!xz.jar (12ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar ...
[SUCCESSFUL ] org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15!apacheds-kerberos-codec.jar(bundle) (26ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.7.1/curator-framework-2.7.1.jar ...
[SUCCESSFUL ] org.apache.curator#curator-framework;2.7.1!curator-framework.jar(bundle) (15ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar ...
[SUCCESSFUL ] org.apache.directory.server#apacheds-i18n;2.0.0-M15!apacheds-i18n.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar ...
[SUCCESSFUL ] org.apache.directory.api#api-asn1-api;1.0.0-M20!api-asn1-api.jar(bundle) (10ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar ...
[SUCCESSFUL ] org.apache.directory.api#api-util;1.0.0-M20!api-util.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar ...
[SUCCESSFUL ] org.slf4j#slf4j-log4j12;1.7.10!slf4j-log4j12.jar (10ms)
downloading https://repo1.maven.org/maven2/io/netty/netty/3.6.2.Final/netty-3.6.2.Final.jar ...
[SUCCESSFUL ] io.netty#netty;3.6.2.Final!netty.jar(bundle) (61ms)
downloading https://repo1.maven.org/maven2/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar ...
[SUCCESSFUL ] javax.servlet.jsp#jsp-api;2.1!jsp-api.jar (14ms)
downloading https://repo1.maven.org/maven2/jline/jline/0.9.94/jline-0.9.94.jar ...
[SUCCESSFUL ] jline#jline;0.9.94!jline.jar (11ms)
downloading https://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ...
[SUCCESSFUL ] junit#junit;4.11!junit.jar (17ms)
downloading https://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar ...
[SUCCESSFUL ] org.hamcrest#hamcrest-core;1.3!hamcrest-core.jar (10ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.2.3/jackson-core-2.2.3.jar ...
[SUCCESSFUL ] com.fasterxml.jackson.core#jackson-core;2.2.3!jackson-core.jar (13ms)
downloading https://repo1.maven.org/maven2/joda-time/joda-time/2.9.7/joda-time-2.9.7.jar ...
[SUCCESSFUL ] joda-time#joda-time;2.9.7!joda-time.jar (24ms)
:: resolution report :: resolve 13919ms :: artifacts dl 1956ms
:: modules in use:
asm#asm;3.2 from central in [default]
com.amazonaws#aws-java-sdk;1.7.4 from central in [default]
com.fasterxml.jackson.core#jackson-annotations;2.2.3 from central in [default]
com.fasterxml.jackson.core#jackson-core;2.2.3 from central in [default]
com.fasterxml.jackson.core#jackson-databind;2.2.3 from central in [default]
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
com.google.code.gson#gson;2.2.4 from central in [default]
com.google.guava#guava;11.0.2 from central in [default]
com.google.protobuf#protobuf-java;2.5.0 from central in [default]
com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
com.jcraft#jsch;0.1.42 from central in [default]
com.sun.jersey#jersey-core;1.9 from central in [default]
com.sun.jersey#jersey-json;1.9 from central in [default]
com.sun.jersey#jersey-server;1.9 from central in [default]
com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
commons-beanutils#commons-beanutils;1.7.0 from central in [default]
commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
commons-cli#commons-cli;1.2 from central in [default]
commons-codec#commons-codec;1.4 from central in [default]
commons-collections#commons-collections;3.2.2 from central in [default]
commons-configuration#commons-configuration;1.6 from central in [default]
commons-digester#commons-digester;1.8 from central in [default]
commons-httpclient#commons-httpclient;3.1 from central in [default]
commons-io#commons-io;2.4 from central in [default]
commons-lang#commons-lang;2.6 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
commons-net#commons-net;3.1 from central in [default]
io.netty#netty;3.6.2.Final from central in [default]
javax.activation#activation;1.1 from central in [default]
javax.servlet#servlet-api;2.5 from central in [default]
javax.servlet.jsp#jsp-api;2.1 from central in [default]
javax.xml.bind#jaxb-api;2.2.2 from central in [default]
javax.xml.stream#stax-api;1.0-2 from central in [default]
jline#jline;0.9.94 from central in [default]
joda-time#joda-time;2.9.7 from central in [default]
junit#junit;4.11 from central in [default]
log4j#log4j;1.2.17 from central in [default]
net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
org.apache.avro#avro;1.7.4 from central in [default]
org.apache.commons#commons-compress;1.4.1 from central in [default]
org.apache.commons#commons-math3;3.1.1 from central in [default]
org.apache.curator#curator-client;2.7.1 from central in [default]
org.apache.curator#curator-framework;2.7.1 from central in [default]
org.apache.curator#curator-recipes;2.7.1 from central in [default]
org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
org.apache.hadoop#hadoop-annotations;2.7.3 from central in [default]
org.apache.hadoop#hadoop-auth;2.7.3 from central in [default]
org.apache.hadoop#hadoop-aws;2.7.3 from central in [default]
org.apache.hadoop#hadoop-common;2.7.3 from central in [default]
org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
org.apache.httpcomponents#httpclient;4.2.5 from central in [default]
org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-xc;1.9.13 from central in [default]
org.codehaus.jettison#jettison;1.1 from central in [default]
org.hamcrest#hamcrest-core;1.3 from central in [default]
org.mortbay.jetty#jetty;6.1.26 from central in [default]
org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
org.slf4j#slf4j-api;1.7.10 from central in [default]
org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
org.tukaani#xz;1.0 from central in [default]
org.xerial.snappy#snappy-java;1.0.4.1 from central in [default]
xmlenc#xmlenc;0.52 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 70 | 70 | 70 | 0 || 70 | 70 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
70 artifacts copied, 0 already retrieved (36491kB/109ms)
Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/30 04:00:19 INFO RestSubmissionClient: Submitting a request to launch an application in spark://ip-172-30-0-180:6066.
17/01/30 04:00:20 INFO RestSubmissionClient: Submission successfully created as driver-20170130040020-0000. Polling submission state...
17/01/30 04:00:20 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170130040020-0000 in spark://ip-172-30-0-180:6066.
17/01/30 04:00:20 INFO RestSubmissionClient: State of driver driver-20170130040020-0000 is now RUNNING.
17/01/30 04:00:20 INFO RestSubmissionClient: Driver is running on worker worker-20170130035551-172.30.0.48-45346 at 172.30.0.48:45346.
17/01/30 04:00:20 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20170130040020-0000",
"serverSparkVersion" : "2.1.0",
"submissionId" : "driver-20170130040020-0000",
"success" : true
}
Unfortunately --packages
does not work with --deploy-mode cluster
: see SPARK-12559
Unfortunately --packages does not work with --deploy-mode cluster : see SPARK-12559
That looks like an actual issue for "java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found" error.
looks like SPARK-10789 also talk about same issue.
Thanks for the references @dm-tran and @pragnesh. Looks like the application jar issue is something that should be left for future work.
Btw @pragnesh, I edited your comment to use triple backticks to format long blocks of code. It looks better. 👍
If there are no other questions or concerns about this PR, I will merge it in tonight or tomorrow.
Lots of people have trouble accessing S3 from their Flintrock clusters.
90, which is about accessing S3 from Flintrock clusters, is the most visited issue on this project. A related issue, #88, which is driven by the same problem, is the second-most visited issue on this project. Two recent guides that go over how to use Flintrock -- this one and this one -- take time to address the same issue.
This PR attempts to address this common problem by 1) setting better defaults that enable Spark on Flintrock clusters to seamlessly access data on S3, and 2) by providing instructions in the README on how to make use of these new defaults.
I tested this PR by launching several clusters in a variety of configurations. I was able to seamlessly access S3 in all cases.
It seems to be working well, but I would like to get some feedback from people who have hit this issue in the past to make sure I'm headed in the right direction here:
I know this PR may be too late for some of you, since you may have moved on or come up with your own workaround. So no hard feelings if you are not interested. And of course, if anyone else reading this would like to chime in with their feedback that would also be helpful.
If you would like to install Flintrock directly from this PR (assuming you are running Python 3.4+), you can do that with this:
Fixes #90.