nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
637 stars 116 forks source link

Enable easy S3 access #180

Closed nchammas closed 7 years ago

nchammas commented 7 years ago

Lots of people have trouble accessing S3 from their Flintrock clusters.

90, which is about accessing S3 from Flintrock clusters, is the most visited issue on this project. A related issue, #88, which is driven by the same problem, is the second-most visited issue on this project. Two recent guides that go over how to use Flintrock -- this one and this one -- take time to address the same issue.

This PR attempts to address this common problem by 1) setting better defaults that enable Spark on Flintrock clusters to seamlessly access data on S3, and 2) by providing instructions in the README on how to make use of these new defaults.

I tested this PR by launching several clusters in a variety of configurations. I was able to seamlessly access S3 in all cases.

It seems to be working well, but I would like to get some feedback from people who have hit this issue in the past to make sure I'm headed in the right direction here:

I know this PR may be too late for some of you, since you may have moved on or come up with your own workaround. So no hard feelings if you are not interested. And of course, if anyone else reading this would like to chime in with their feedback that would also be helpful.

If you would like to install Flintrock directly from this PR (assuming you are running Python 3.4+), you can do that with this:

pip install git+https://github.com/nchammas/flintrock@easy-s3-access

Fixes #90.

dm-tran commented 7 years ago

Looks good to me 👍

AlexIoannides commented 7 years ago

Hello,

Sounds great - will take a look on the weekend!

Alex

pragnesh commented 7 years ago

does this also support loading application jar from s3 ?

pragnesh commented 7 years ago

I have just tried launching job from application jar from s3 using flintrock from branch easy-s3-access , i have got following exception

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Only way i got this working by adding jar to SPARK_CLASSPATH, https://github.com/pragnesh/flintrock/commit/48c0e16ba0bcf558c707bec51a65f622c7ab4402

i know SPARK_CLASSPATH is deprecated since Spark 1.0, but this is the only way it works.

nchammas commented 7 years ago

@pragnesh - I'm not familiar with launching jobs in the way you describe. Can you share an example of the command that triggers the error you reported?

pragnesh commented 7 years ago

@nchammas for example i have uploaded jar spark-examples_2.11-2.1.0.jar to "example_bucket" on s3, if tried to launch job org.apache.spark.examples.SparkPi using following command then I got "Class org.apache.hadoop.fs.s3a.S3AFileSystem not found" exception

spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master spark://`hostname`:6066 \
s3a://example_bucket/spark-examples_2.11-2.1.0.jar

This command only work if i add s3 related jar to SPARK_CLASSPATH, This scenario supported with emr.

PiercingDan commented 7 years ago

Hey guys,


Glad to see easier S3 access being implemented.

I remembered that I got some pretty unintelligible errors when trying to access S3 files with Spark 2.0.1 on Hadoop 2.7.2, 2.6, 2.5. Had to revert to 2.4 and a custom version of Spark built against 2.4. Referenced this issue SPARK-7442.

This is detailed in my guide under Using Flintrock

It seems that the above will work with Spark 2.0, 2.1 on Hadoop 2.7.3. I will try it when possible,


Danny

Side note: Reading zipped csv files from S3 didn't work for me if anyone has any luck with that

nchammas commented 7 years ago

@pragnesh - If you try this command with Spark 2.1 and Hadoop 2.7, does it work for you?

spark-submit \
  --packages org.apache.hadoop:hadoop-aws:2.7.3
  --deploy-mode cluster \
  --class org.apache.spark.examples.SparkPi \
  --master spark://hostname:6066 \
  s3a://example_bucket/spark-examples_2.11-2.1.0.jar
nchammas commented 7 years ago

Hey @PiercingDan - Yup, this PR is supposed to address precisely the problem you documented in your guide, and which I also reported in SPARK-7442. 👍

Side note: Reading zipped csv files from S3 didn't work for me if anyone has any luck with that

Did this work using the approach you documented in your guide but not with this PR? Or are you reporting a separate issue? If it's a separate issue that you think may be related to Flintrock, I suggest opening a new issue. I'll take a look.

PiercingDan commented 7 years ago

@nchammas I followed the method in my guide. I suspect the issue is unrelated to Flintrock, it is most likely spark-csv related.

pragnesh commented 7 years ago

@nchammas I have just tried the way you ask me to test it is still failing with same exception,

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Here is spark-submit log,

Ivy Default Cache set to: /home/ec2-user/.ivy2/cache
The jars for the packages stored in: /home/ec2-user/.ivy2/jars
:: loading settings :: url = jar:file:/home/ec2-user/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found org.apache.hadoop#hadoop-aws;2.7.3 in central
    found org.apache.hadoop#hadoop-common;2.7.3 in central
    found org.apache.hadoop#hadoop-annotations;2.7.3 in central
    found com.google.guava#guava;11.0.2 in central
    found com.google.code.findbugs#jsr305;3.0.0 in central
    found commons-cli#commons-cli;1.2 in central
    found org.apache.commons#commons-math3;3.1.1 in central
    found xmlenc#xmlenc;0.52 in central
    found commons-httpclient#commons-httpclient;3.1 in central
    found commons-logging#commons-logging;1.1.3 in central
    found commons-codec#commons-codec;1.4 in central
    found commons-io#commons-io;2.4 in central
    found commons-net#commons-net;3.1 in central
    found commons-collections#commons-collections;3.2.2 in central
    found javax.servlet#servlet-api;2.5 in central
    found org.mortbay.jetty#jetty;6.1.26 in central
    found org.mortbay.jetty#jetty-util;6.1.26 in central
    found com.sun.jersey#jersey-core;1.9 in central
    found com.sun.jersey#jersey-json;1.9 in central
    found org.codehaus.jettison#jettison;1.1 in central
    found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
    found javax.xml.bind#jaxb-api;2.2.2 in central
    found javax.xml.stream#stax-api;1.0-2 in central
    found javax.activation#activation;1.1 in central
    found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
    found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
    found org.codehaus.jackson#jackson-xc;1.9.13 in central
    found com.sun.jersey#jersey-server;1.9 in central
    found asm#asm;3.2 in central
    found log4j#log4j;1.2.17 in central
    found net.java.dev.jets3t#jets3t;0.9.0 in central
    found org.apache.httpcomponents#httpclient;4.2.5 in central
    found org.apache.httpcomponents#httpcore;4.2.5 in central
    found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
    found commons-lang#commons-lang;2.6 in central
    found commons-configuration#commons-configuration;1.6 in central
    found commons-digester#commons-digester;1.8 in central
    found commons-beanutils#commons-beanutils;1.7.0 in central
    found commons-beanutils#commons-beanutils-core;1.8.0 in central
    found org.slf4j#slf4j-api;1.7.10 in central
    found org.apache.avro#avro;1.7.4 in central
    found com.thoughtworks.paranamer#paranamer;2.3 in central
    found org.xerial.snappy#snappy-java;1.0.4.1 in central
    found org.apache.commons#commons-compress;1.4.1 in central
    found org.tukaani#xz;1.0 in central
    found com.google.protobuf#protobuf-java;2.5.0 in central
    found com.google.code.gson#gson;2.2.4 in central
    found org.apache.hadoop#hadoop-auth;2.7.3 in central
    found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
    found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
    found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
    found org.apache.directory.api#api-util;1.0.0-M20 in central
    found org.apache.zookeeper#zookeeper;3.4.6 in central
    found org.slf4j#slf4j-log4j12;1.7.10 in central
    found io.netty#netty;3.6.2.Final in central
    found org.apache.curator#curator-framework;2.7.1 in central
    found org.apache.curator#curator-client;2.7.1 in central
    found com.jcraft#jsch;0.1.42 in central
    found org.apache.curator#curator-recipes;2.7.1 in central
    found org.apache.htrace#htrace-core;3.1.0-incubating in central
    found javax.servlet.jsp#jsp-api;2.1 in central
    found jline#jline;0.9.94 in central
    found junit#junit;4.11 in central
    found org.hamcrest#hamcrest-core;1.3 in central
    found com.fasterxml.jackson.core#jackson-databind;2.2.3 in central
    found com.fasterxml.jackson.core#jackson-annotations;2.2.3 in central
    found com.fasterxml.jackson.core#jackson-core;2.2.3 in central
    found com.amazonaws#aws-java-sdk;1.7.4 in central
    found joda-time#joda-time;2.9.7 in central
    [2.9.7] joda-time#joda-time;[2.2,)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar ...
    [SUCCESSFUL ] org.apache.hadoop#hadoop-aws;2.7.3!hadoop-aws.jar (36ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.3/hadoop-common-2.7.3.jar ...
    [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.7.3!hadoop-common.jar (280ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.2.3/jackson-databind-2.2.3.jar ...
    [SUCCESSFUL ] com.fasterxml.jackson.core#jackson-databind;2.2.3!jackson-databind.jar (45ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.2.3/jackson-annotations-2.2.3.jar ...
    [SUCCESSFUL ] com.fasterxml.jackson.core#jackson-annotations;2.2.3!jackson-annotations.jar (12ms)
downloading https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar ...
    [SUCCESSFUL ] com.amazonaws#aws-java-sdk;1.7.4!aws-java-sdk.jar (328ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-annotations/2.7.3/hadoop-annotations-2.7.3.jar ...
    [SUCCESSFUL ] org.apache.hadoop#hadoop-annotations;2.7.3!hadoop-annotations.jar (11ms)
downloading https://repo1.maven.org/maven2/com/google/guava/guava/11.0.2/guava-11.0.2.jar ...
    [SUCCESSFUL ] com.google.guava#guava;11.0.2!guava.jar (69ms)
downloading https://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar ...
    [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (12ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar ...
    [SUCCESSFUL ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (53ms)
downloading https://repo1.maven.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar ...
    [SUCCESSFUL ] xmlenc#xmlenc;0.52!xmlenc.jar (10ms)
downloading https://repo1.maven.org/maven2/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar ...
    [SUCCESSFUL ] commons-httpclient#commons-httpclient;3.1!commons-httpclient.jar (18ms)
downloading https://repo1.maven.org/maven2/commons-codec/commons-codec/1.4/commons-codec-1.4.jar ...
    [SUCCESSFUL ] commons-codec#commons-codec;1.4!commons-codec.jar (12ms)
downloading https://repo1.maven.org/maven2/commons-io/commons-io/2.4/commons-io-2.4.jar ...
    [SUCCESSFUL ] commons-io#commons-io;2.4!commons-io.jar (14ms)
downloading https://repo1.maven.org/maven2/commons-net/commons-net/3.1/commons-net-3.1.jar ...
    [SUCCESSFUL ] commons-net#commons-net;3.1!commons-net.jar (17ms)
downloading https://repo1.maven.org/maven2/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar ...
    [SUCCESSFUL ] commons-collections#commons-collections;3.2.2!commons-collections.jar (24ms)
downloading https://repo1.maven.org/maven2/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar ...
    [SUCCESSFUL ] javax.servlet#servlet-api;2.5!servlet-api.jar (12ms)
downloading https://repo1.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar ...
    [SUCCESSFUL ] org.mortbay.jetty#jetty;6.1.26!jetty.jar (24ms)
downloading https://repo1.maven.org/maven2/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar ...
    [SUCCESSFUL ] org.mortbay.jetty#jetty-util;6.1.26!jetty-util.jar (14ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-core/1.9/jersey-core-1.9.jar ...
    [SUCCESSFUL ] com.sun.jersey#jersey-core;1.9!jersey-core.jar(bundle) (20ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar ...
    [SUCCESSFUL ] com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle) (13ms)
downloading https://repo1.maven.org/maven2/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar ...
    [SUCCESSFUL ] com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (32ms)
downloading https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar ...
    [SUCCESSFUL ] commons-logging#commons-logging;1.1.3!commons-logging.jar (12ms)
downloading https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar ...
    [SUCCESSFUL ] log4j#log4j;1.2.17!log4j.jar(bundle) (23ms)
downloading https://repo1.maven.org/maven2/net/java/dev/jets3t/jets3t/0.9.0/jets3t-0.9.0.jar ...
    [SUCCESSFUL ] net.java.dev.jets3t#jets3t;0.9.0!jets3t.jar (22ms)
downloading https://repo1.maven.org/maven2/commons-lang/commons-lang/2.6/commons-lang-2.6.jar ...
    [SUCCESSFUL ] commons-lang#commons-lang;2.6!commons-lang.jar (17ms)
downloading https://repo1.maven.org/maven2/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar ...
    [SUCCESSFUL ] commons-configuration#commons-configuration;1.6!commons-configuration.jar (16ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar ...
    [SUCCESSFUL ] org.slf4j#slf4j-api;1.7.10!slf4j-api.jar (10ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar ...
    [SUCCESSFUL ] org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar (15ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar ...
    [SUCCESSFUL ] org.codehaus.jackson#jackson-mapper-asl;1.9.13!jackson-mapper-asl.jar (29ms)
downloading https://repo1.maven.org/maven2/org/apache/avro/avro/1.7.4/avro-1.7.4.jar ...
    [SUCCESSFUL ] org.apache.avro#avro;1.7.4!avro.jar (17ms)
downloading https://repo1.maven.org/maven2/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar ...
    [SUCCESSFUL ] com.google.protobuf#protobuf-java;2.5.0!protobuf-java.jar(bundle) (32ms)
downloading https://repo1.maven.org/maven2/com/google/code/gson/gson/2.2.4/gson-2.2.4.jar ...
    [SUCCESSFUL ] com.google.code.gson#gson;2.2.4!gson.jar (18ms)
downloading https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/2.7.3/hadoop-auth-2.7.3.jar ...
    [SUCCESSFUL ] org.apache.hadoop#hadoop-auth;2.7.3!hadoop-auth.jar (15ms)
downloading https://repo1.maven.org/maven2/com/jcraft/jsch/0.1.42/jsch-0.1.42.jar ...
    [SUCCESSFUL ] com.jcraft#jsch;0.1.42!jsch.jar (18ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-client/2.7.1/curator-client-2.7.1.jar ...
    [SUCCESSFUL ] org.apache.curator#curator-client;2.7.1!curator-client.jar(bundle) (13ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-recipes/2.7.1/curator-recipes-2.7.1.jar ...
    [SUCCESSFUL ] org.apache.curator#curator-recipes;2.7.1!curator-recipes.jar(bundle) (21ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar ...
    [SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar (11ms)
downloading https://repo1.maven.org/maven2/org/apache/htrace/htrace-core/3.1.0-incubating/htrace-core-3.1.0-incubating.jar ...
    [SUCCESSFUL ] org.apache.htrace#htrace-core;3.1.0-incubating!htrace-core.jar (54ms)
downloading https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6.jar ...
    [SUCCESSFUL ] org.apache.zookeeper#zookeeper;3.4.6!zookeeper.jar (28ms)
downloading https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar ...
    [SUCCESSFUL ] org.apache.commons#commons-compress;1.4.1!commons-compress.jar (16ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar ...
    [SUCCESSFUL ] org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar ...
    [SUCCESSFUL ] com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (31ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-jaxrs/1.9.13/jackson-jaxrs-1.9.13.jar ...
    [SUCCESSFUL ] org.codehaus.jackson#jackson-jaxrs;1.9.13!jackson-jaxrs.jar (10ms)
downloading https://repo1.maven.org/maven2/org/codehaus/jackson/jackson-xc/1.9.13/jackson-xc-1.9.13.jar ...
    [SUCCESSFUL ] org.codehaus.jackson#jackson-xc;1.9.13!jackson-xc.jar (10ms)
downloading https://repo1.maven.org/maven2/javax/xml/bind/jaxb-api/2.2.2/jaxb-api-2.2.2.jar ...
    [SUCCESSFUL ] javax.xml.bind#jaxb-api;2.2.2!jaxb-api.jar (11ms)
downloading https://repo1.maven.org/maven2/javax/xml/stream/stax-api/1.0-2/stax-api-1.0-2.jar ...
    [SUCCESSFUL ] javax.xml.stream#stax-api;1.0-2!stax-api.jar (10ms)
downloading https://repo1.maven.org/maven2/javax/activation/activation/1.1/activation-1.1.jar ...
    [SUCCESSFUL ] javax.activation#activation;1.1!activation.jar (13ms)
downloading https://repo1.maven.org/maven2/asm/asm/3.2/asm-3.2.jar ...
    [SUCCESSFUL ] asm#asm;3.2!asm.jar (10ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpclient/4.2.5/httpclient-4.2.5.jar ...
    [SUCCESSFUL ] org.apache.httpcomponents#httpclient;4.2.5!httpclient.jar (20ms)
downloading https://repo1.maven.org/maven2/org/apache/httpcomponents/httpcore/4.2.5/httpcore-4.2.5.jar ...
    [SUCCESSFUL ] org.apache.httpcomponents#httpcore;4.2.5!httpcore.jar (16ms)
downloading https://repo1.maven.org/maven2/com/jamesmurty/utils/java-xmlbuilder/0.4/java-xmlbuilder-0.4.jar ...
    [SUCCESSFUL ] com.jamesmurty.utils#java-xmlbuilder;0.4!java-xmlbuilder.jar (9ms)
downloading https://repo1.maven.org/maven2/commons-digester/commons-digester/1.8/commons-digester-1.8.jar ...
    [SUCCESSFUL ] commons-digester#commons-digester;1.8!commons-digester.jar (13ms)
downloading https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar ...
    [SUCCESSFUL ] commons-beanutils#commons-beanutils-core;1.8.0!commons-beanutils-core.jar (14ms)
downloading https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar ...
    [SUCCESSFUL ] commons-beanutils#commons-beanutils;1.7.0!commons-beanutils.jar (13ms)
downloading https://repo1.maven.org/maven2/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar ...
    [SUCCESSFUL ] com.thoughtworks.paranamer#paranamer;2.3!paranamer.jar (10ms)
downloading https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar ...
    [SUCCESSFUL ] org.xerial.snappy#snappy-java;1.0.4.1!snappy-java.jar(bundle) (34ms)
downloading https://repo1.maven.org/maven2/org/tukaani/xz/1.0/xz-1.0.jar ...
    [SUCCESSFUL ] org.tukaani#xz;1.0!xz.jar (12ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/server/apacheds-kerberos-codec/2.0.0-M15/apacheds-kerberos-codec-2.0.0-M15.jar ...
    [SUCCESSFUL ] org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15!apacheds-kerberos-codec.jar(bundle) (26ms)
downloading https://repo1.maven.org/maven2/org/apache/curator/curator-framework/2.7.1/curator-framework-2.7.1.jar ...
    [SUCCESSFUL ] org.apache.curator#curator-framework;2.7.1!curator-framework.jar(bundle) (15ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/server/apacheds-i18n/2.0.0-M15/apacheds-i18n-2.0.0-M15.jar ...
    [SUCCESSFUL ] org.apache.directory.server#apacheds-i18n;2.0.0-M15!apacheds-i18n.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/api/api-asn1-api/1.0.0-M20/api-asn1-api-1.0.0-M20.jar ...
    [SUCCESSFUL ] org.apache.directory.api#api-asn1-api;1.0.0-M20!api-asn1-api.jar(bundle) (10ms)
downloading https://repo1.maven.org/maven2/org/apache/directory/api/api-util/1.0.0-M20/api-util-1.0.0-M20.jar ...
    [SUCCESSFUL ] org.apache.directory.api#api-util;1.0.0-M20!api-util.jar(bundle) (11ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar ...
    [SUCCESSFUL ] org.slf4j#slf4j-log4j12;1.7.10!slf4j-log4j12.jar (10ms)
downloading https://repo1.maven.org/maven2/io/netty/netty/3.6.2.Final/netty-3.6.2.Final.jar ...
    [SUCCESSFUL ] io.netty#netty;3.6.2.Final!netty.jar(bundle) (61ms)
downloading https://repo1.maven.org/maven2/javax/servlet/jsp/jsp-api/2.1/jsp-api-2.1.jar ...
    [SUCCESSFUL ] javax.servlet.jsp#jsp-api;2.1!jsp-api.jar (14ms)
downloading https://repo1.maven.org/maven2/jline/jline/0.9.94/jline-0.9.94.jar ...
    [SUCCESSFUL ] jline#jline;0.9.94!jline.jar (11ms)
downloading https://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ...
    [SUCCESSFUL ] junit#junit;4.11!junit.jar (17ms)
downloading https://repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar ...
    [SUCCESSFUL ] org.hamcrest#hamcrest-core;1.3!hamcrest-core.jar (10ms)
downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.2.3/jackson-core-2.2.3.jar ...
    [SUCCESSFUL ] com.fasterxml.jackson.core#jackson-core;2.2.3!jackson-core.jar (13ms)
downloading https://repo1.maven.org/maven2/joda-time/joda-time/2.9.7/joda-time-2.9.7.jar ...
    [SUCCESSFUL ] joda-time#joda-time;2.9.7!joda-time.jar (24ms)
:: resolution report :: resolve 13919ms :: artifacts dl 1956ms
    :: modules in use:
    asm#asm;3.2 from central in [default]
    com.amazonaws#aws-java-sdk;1.7.4 from central in [default]
    com.fasterxml.jackson.core#jackson-annotations;2.2.3 from central in [default]
    com.fasterxml.jackson.core#jackson-core;2.2.3 from central in [default]
    com.fasterxml.jackson.core#jackson-databind;2.2.3 from central in [default]
    com.google.code.findbugs#jsr305;3.0.0 from central in [default]
    com.google.code.gson#gson;2.2.4 from central in [default]
    com.google.guava#guava;11.0.2 from central in [default]
    com.google.protobuf#protobuf-java;2.5.0 from central in [default]
    com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
    com.jcraft#jsch;0.1.42 from central in [default]
    com.sun.jersey#jersey-core;1.9 from central in [default]
    com.sun.jersey#jersey-json;1.9 from central in [default]
    com.sun.jersey#jersey-server;1.9 from central in [default]
    com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
    com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
    commons-beanutils#commons-beanutils;1.7.0 from central in [default]
    commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
    commons-cli#commons-cli;1.2 from central in [default]
    commons-codec#commons-codec;1.4 from central in [default]
    commons-collections#commons-collections;3.2.2 from central in [default]
    commons-configuration#commons-configuration;1.6 from central in [default]
    commons-digester#commons-digester;1.8 from central in [default]
    commons-httpclient#commons-httpclient;3.1 from central in [default]
    commons-io#commons-io;2.4 from central in [default]
    commons-lang#commons-lang;2.6 from central in [default]
    commons-logging#commons-logging;1.1.3 from central in [default]
    commons-net#commons-net;3.1 from central in [default]
    io.netty#netty;3.6.2.Final from central in [default]
    javax.activation#activation;1.1 from central in [default]
    javax.servlet#servlet-api;2.5 from central in [default]
    javax.servlet.jsp#jsp-api;2.1 from central in [default]
    javax.xml.bind#jaxb-api;2.2.2 from central in [default]
    javax.xml.stream#stax-api;1.0-2 from central in [default]
    jline#jline;0.9.94 from central in [default]
    joda-time#joda-time;2.9.7 from central in [default]
    junit#junit;4.11 from central in [default]
    log4j#log4j;1.2.17 from central in [default]
    net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
    org.apache.avro#avro;1.7.4 from central in [default]
    org.apache.commons#commons-compress;1.4.1 from central in [default]
    org.apache.commons#commons-math3;3.1.1 from central in [default]
    org.apache.curator#curator-client;2.7.1 from central in [default]
    org.apache.curator#curator-framework;2.7.1 from central in [default]
    org.apache.curator#curator-recipes;2.7.1 from central in [default]
    org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
    org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
    org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
    org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
    org.apache.hadoop#hadoop-annotations;2.7.3 from central in [default]
    org.apache.hadoop#hadoop-auth;2.7.3 from central in [default]
    org.apache.hadoop#hadoop-aws;2.7.3 from central in [default]
    org.apache.hadoop#hadoop-common;2.7.3 from central in [default]
    org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
    org.apache.httpcomponents#httpclient;4.2.5 from central in [default]
    org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
    org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
    org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
    org.codehaus.jackson#jackson-xc;1.9.13 from central in [default]
    org.codehaus.jettison#jettison;1.1 from central in [default]
    org.hamcrest#hamcrest-core;1.3 from central in [default]
    org.mortbay.jetty#jetty;6.1.26 from central in [default]
    org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
    org.slf4j#slf4j-api;1.7.10 from central in [default]
    org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
    org.tukaani#xz;1.0 from central in [default]
    org.xerial.snappy#snappy-java;1.0.4.1 from central in [default]
    xmlenc#xmlenc;0.52 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   70  |   70  |   70  |   0   ||   70  |   70  |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    70 artifacts copied, 0 already retrieved (36491kB/109ms)
Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/01/30 04:00:19 INFO RestSubmissionClient: Submitting a request to launch an application in spark://ip-172-30-0-180:6066.
17/01/30 04:00:20 INFO RestSubmissionClient: Submission successfully created as driver-20170130040020-0000. Polling submission state...
17/01/30 04:00:20 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20170130040020-0000 in spark://ip-172-30-0-180:6066.
17/01/30 04:00:20 INFO RestSubmissionClient: State of driver driver-20170130040020-0000 is now RUNNING.
17/01/30 04:00:20 INFO RestSubmissionClient: Driver is running on worker worker-20170130035551-172.30.0.48-45346 at 172.30.0.48:45346.
17/01/30 04:00:20 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20170130040020-0000",
  "serverSparkVersion" : "2.1.0",
  "submissionId" : "driver-20170130040020-0000",
  "success" : true
}
dm-tran commented 7 years ago

Unfortunately --packages does not work with --deploy-mode cluster : see SPARK-12559

pragnesh commented 7 years ago

Unfortunately --packages does not work with --deploy-mode cluster : see SPARK-12559

That looks like an actual issue for "java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found" error.

pragnesh commented 7 years ago

looks like SPARK-10789 also talk about same issue.

nchammas commented 7 years ago

Thanks for the references @dm-tran and @pragnesh. Looks like the application jar issue is something that should be left for future work.

Btw @pragnesh, I edited your comment to use triple backticks to format long blocks of code. It looks better. 👍

If there are no other questions or concerns about this PR, I will merge it in tonight or tomorrow.