mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book
http://mapreduce4hackers.com
Other
1.07k stars 665 forks source link

Hi am facing an issue with submit a job from java. #16

Open ankushreddy opened 7 years ago

ankushreddy commented 7 years ago

Hi when am invoking https://github.com/mahmoudparsian/data-algorithms-book/blob/master/misc/how-to-submit-spark-job-to-yarn-from-java-code.md this class from a running spark-submit from local it is getting invoked and able to submit the spark-submit to the yarn cluster.

But when am invoking the class by running a spark-submit which is being submitted to yarn then this particular how-to-submit-spark-job-to-yarn-from-java-code.md class is getting accepted but it is not moving to a running state. and getting failed by throwing an error.

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Application application_1493671618562_0072 failed 5 times due to AM Container for appattempt_1493671618562_0072_000005 exited with exitCode: 1 For more detailed output, check the application tracking page: http://headnode.internal.cloudapp.net:8088/cluster/app/application_1493671618562_0072 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e02_1493671618562_0072_05_000001 Exit code: 1 Exception message: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution Stack trace: ExitCodeException exitCode=1: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution at org.apache.hadoop.util.Shell.runCommand(Shell.java:933) at org.apache.hadoop.util.Shell.run(Shell.java:844) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.

Thank you for your help.

Thanks, Ankush Reddy.

mahmoudparsian commented 7 years ago

Please provide more details: your script, its log and error messages.

ankushreddy commented 7 years ago

Hi @mahmoudparsian

this is the logs.

Log Type: directory.info Log Upload Time: Fri May 05 06:03:26 +0000 2017 Log Length: 5492 ls -l: total 36 lrwxrwxrwx 1 yarn hadoop 95 May 5 06:03 app.jar -> /mnt/resource/hadoop/yarn/local/filecache/10/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar -rw-r--r-- 1 yarn hadoop 74 May 5 06:03 container_tokens -rwx------ 1 yarn hadoop 710 May 5 06:03 default_container_executor_session.sh -rwx------ 1 yarn hadoop 764 May 5 06:03 default_container_executor.sh -rwx------ 1 yarn hadoop 6433 May 5 06:03 launch_container.sh lrwxrwxrwx 1 yarn hadoop 102 May 5 06:03 spark_conf -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/80/spark_conf6125877397366945561.zip lrwxrwxrwx 1 yarn hadoop 125 May 5 06:03 spark.jar -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/81/spark-assembly-1.6.3.2.5.4.0-121-hadoop2.7.3.2.5.4.0-121.jar drwx--x--- 2 yarn hadoop 4096 May 5 06:03 tmp find -L . -maxdepth 5 -ls: 3933556 4 drwx--x--- 3 yarn hadoop 4096 May 5 06:03 . 3933558 4 drwx--x--- 2 yarn hadoop 4096 May 5 06:03 ./tmp 3933562 4 -rw-r--r-- 1 yarn hadoop 60 May 5 06:03 ./.launch_container.sh.crc 3933517 185944 -r-x------ 1 yarn hadoop 190402950 May 5 06:03 ./spark.jar 3933564 4 -rw-r--r-- 1 yarn hadoop 16 May 5 06:03 ./.default_container_executor_session.sh.crc 3933518 4 drwx------ 2 yarn hadoop 4096 May 5 06:03 ./spark_conf 3933548 4 -r-x------ 1 yarn hadoop 945 May 5 06:03 ./spark_conf/taskcontroller.cfg 3933543 4 -r-x------ 1 yarn hadoop 249 May 5 06:03 ./spark_conf/slaves 3933541 4 -r-x------ 1 yarn hadoop 2316 May 5 06:03 ./spark_conf/ssl-client.xml.example 3933520 4 -r-x------ 1 yarn hadoop 1734 May 5 06:03 ./spark_conf/log4j.properties 3933526 4 -r-x------ 1 yarn hadoop 265 May 5 06:03 ./spark_conf/hadoop-metrics2-azure-file-system.properties 3933536 4 -r-x------ 1 yarn hadoop 1045 May 5 06:03 ./spark_conf/container-executor.cfg 3933519 8 -r-x------ 1 yarn hadoop 5685 May 5 06:03 ./spark_conf/hadoop-env.sh 3933531 4 -r-x------ 1 yarn hadoop 2358 May 5 06:03 ./spark_conf__/topology_script.py 3933547 8 -r-x------ 1 yarn hadoop 4113 May 5 06:03 ./spark_conf/mapred-queues.xml.template 3933528 4 -r-x------ 1 yarn hadoop 744 May 5 06:03 ./spark_conf/ssl-client.xml 3933544 4 -r-x------ 1 yarn hadoop 417 May 5 06:03 ./spark_conf/topology_mappings.data 3933549 4 -r-x------ 1 yarn hadoop 342 May 5 06:03 ./__spark_conf/spark_conf.properties 3933523 4 -r-x------ 1 yarn hadoop 247 May 5 06:03 ./spark_conf/hadoop-metrics2-adl-file-system.properties 3933535 4 -r-x------ 1 yarn hadoop 1020 May 5 06:03 ./spark_conf/commons-logging.properties 3933525 24 -r-x------ 1 yarn hadoop 22138 May 5 06:03 ./spark_conf/yarn-site.xml 3933529 4 -r-x------ 1 yarn hadoop 2450 May 5 06:03 ./spark_conf/capacity-scheduler.xml 3933538 4 -r-x------ 1 yarn hadoop 2490 May 5 06:03 ./spark_conf/hadoop-metrics.properties 3933534 12 -r-x------ 1 yarn hadoop 8754 May 5 06:03 ./spark_conf/hdfs-site.xml 3933533 8 -r-x------ 1 yarn hadoop 4261 May 5 06:03 ./spark_conf/yarn-env.sh 3933532 4 -r-x------ 1 yarn hadoop 1335 May 5 06:03 ./spark_conf/configuration.xsl 3933530 4 -r-x------ 1 yarn hadoop 758 May 5 06:03 ./spark_conf/mapred-site.xml.template 3933545 4 -r-x------ 1 yarn hadoop 1000 May 5 06:03 ./spark_conf/ssl-server.xml 3933527 8 -r-x------ 1 yarn hadoop 4680 May 5 06:03 ./spark_conf/core-site.xml 3933522 8 -r-x------ 1 yarn hadoop 5783 May 5 06:03 ./spark_conf/hadoop-metrics2.properties 3933542 4 -r-x------ 1 yarn hadoop 1308 May 5 06:03 ./spark_conf/hadoop-policy.xml 3933540 4 -r-x------ 1 yarn hadoop 1602 May 5 06:03 ./spark_conf__/health_check 3933537 8 -r-x------ 1 yarn hadoop 4221 May 5 06:03 ./spark_conf/task-log4j.properties 3933521 8 -r-x------ 1 yarn hadoop 7596 May 5 06:03 ./spark_conf/mapred-site.xml 3933546 4 -r-x------ 1 yarn hadoop 2697 May 5 06:03 ./spark_conf/ssl-server.xml.example 3933539 4 -r-x------ 1 yarn hadoop 752 May 5 06:03 ./spark_conf/mapred-env.sh 3932820 135852 -r-xr-xr-x 1 yarn hadoop 139105807 May 4 22:53 ./app__.jar 3933566 4 -rw-r--r-- 1 yarn hadoop 16 May 5 06:03 ./.default_container_executor.sh.crc 3933563 4 -rwx------ 1 yarn hadoop 710 May 5 06:03 ./default_container_executor_session.sh 3933559 4 -rw-r--r-- 1 yarn hadoop 74 May 5 06:03 ./container_tokens 3933565 4 -rwx------ 1 yarn hadoop 764 May 5 06:03 ./default_container_executor.sh 3933560 4 -rw-r--r-- 1 yarn hadoop 12 May 5 06:03 ./.container_tokens.crc 3933561 8 -rwx------ 1 yarn hadoop 6433 May 5 06:03 ./launch_container.sh broken symlinks(find -L . -maxdepth 5 -type l -ls):

this is how my project structure is.

spark-application

--> scala1 class // am calling the java class from this class.

--> java class // this will submit another spark application to the yarn cluster.

Another spark-application

--> scala2 class

if am invoking a java class from scala1.class by using spark-submit in the local spark-submit --class scala2.class is getting triggered and working good.

if am invoking a java class from scala1.class by using spark-submit in yarn spark-submit --class scala2.class is getting triggered and facing the issue or error.

import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkException;

public class CallingSparkJob {

        public void submitJob(String latestreceivedpitrL,String newPtr) throws Exception {
           System.out.println("In submit job method");
            try{
            System.out.println("Building a spark command");

   // prepare arguments to be passed to 
   // org.apache.spark.deploy.yarn.Client object
   String[] args = new String[] {
       // the name of your application
"--name",
"name",
// "--master",
    // "yarn",
      //    "--deploy-mode",
    //  "cluster",           

          //       "--conf", "spark.yarn.executor.memoryOverhead=600", "--conf", 
        "spark.yarn.submit.waitAppCompletion=false",

       // memory for driver (optional)
       "--driver-memory",
       "1000M",

       "--num-executors",
       "2",
       "--executor-cores",
       "2",

       // path to your application's JAR file 
       // required in yarn-cluster mode      
       "--jar",
   "wasb://storage_account_container@storageaccount.blob.core.windows.net/user/ankushuser/sparkfiller/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar",
       // name of your application's main class (required)
       "--class",
       "com.test.SparkFiller",

       // comma separated list of local jars that want 
       // SparkContext.addJar to work with      
      // "--addJars",
      // "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",

       // argument 1 for latestreceivedpitrL
    "--arg",
       latestreceivedpitrL,

       // argument 2 for newPtr
     "--arg",
       newPtr,

"--arg",
"yarn-cluster"

   };

  System.out.println("create a Hadoop Configuration object");

// create a Hadoop Configuration object
   Configuration config = new Configuration();

   // identify that you will be using Spark as YARN mode
  System.setProperty("SPARK_YARN_MODE", "true");

   // create an instance of SparkConf object
   SparkConf sparkConf = new SparkConf();
sparkConf.setSparkHome("/usr/hdp/current/spark-client");
    // sparkConf.setMaster("yarn");
    sparkConf.setMaster("yarn-cluster");

   // sparkConf.setAppName("spark-yarn");
   //  sparkConf.set("master", "yarn");

    // sparkConf.set("spark.submit.deployMode", "cluster"); // worked

   // create ClientArguments, which will be passed to Client
   // ClientArguments cArgs = new ClientArguments(args);
   ClientArguments cArgs = new ClientArguments(args, sparkConf);

   // create an instance of yarn Client client
   Client client = new Client(cArgs, config, sparkConf); 

   // submit Spark job to YARN
   client.run(); 
   }catch(Exception e){

       System.out.println("Error submitting spark Job");

       System.out.println(e.getMessage());
   }

  }

 }

this is spark submit command am using.

spark-submit --class scala1 --master yarn --deploy-mode cluster --num-executors 2 --executor-cores 2 --conf spark.yarn.executor.memoryOverhead=600 --conf spark.yarn.submit.waitAppCompletion=false /home/ankushuser/kafka_retry/kafka_retry_test/sparkflightaware/target/sparkflightaware-0.0.1-SNAPSHOT-jar-with-dependencies.jar

if I run this spark-submit command locally it is invoking the java class and the spark-submit command for scala2 application is working good.

If I run it in yarn then am facing the issue.

mahmoudparsian commented 7 years ago

Can you also please include error messages you are getting? If possible, include your Scala classes (all of your classes) as well so that I can try it on my side.

Thanks, Mahmoud

ankushreddy commented 7 years ago

Am getting the above error message alone. I have about 10 to 20 classes on which am using in the application. it will basically pull the data from an api and push it to kafka topic. in this application am invoking a filler job to push the records to document db.

Thanks, Ankush Reddy.

JackDavidson commented 6 years ago

Same issue here. Everything submits without any warnings or errors, but then nothing happens and the yarn logs report 'Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' on two nodes in the cluster.(nothing else on any other node) I have instead resorted to invoking spark-submit from my java code, which is working.

ankushreddy commented 6 years ago

@JackDavidson hi did you copy the jar files to worker nodes? our existing spark application or the command that we are running might not run on the head node itself.

Work around use livy and store your jar war in a hdfs or any location. you can use it to post the spark-submit command.

JackDavidson commented 6 years ago

@ankushreddy My project's jar file (there is only one, and it is a fat jar) is stored in HDFS. I haven't ever done any manual copying of jars anywhere. This works with spark-submit, but maybe submitting from java is different in some way? The jar I am submitting does not contain the class that spark cant find though. The spark libraries under spark home do contain that class though. But I wonder, if I add a dependency on spark-yarn in my submitted fat jar, would spark then be able to find the missing class?

Livy sounds like a great alternative. I'll start looking into that

ankushreddy commented 6 years ago

@JackDavidson in our case we included all the dependencies along with the jar so we didn't face any issues with missing dependencies.

I would suggest you to look at livy in most of the cases that should solve the problem.