Hi am facing an issue with submit a job from java.

ankushreddy commented 7 years ago

Hi when am invoking https://github.com/mahmoudparsian/data-algorithms-book/blob/master/misc/how-to-submit-spark-job-to-yarn-from-java-code.md this class from a running spark-submit from local it is getting invoked and able to submit the spark-submit to the yarn cluster.

But when am invoking the class by running a spark-submit which is being submitted to yarn then this particular how-to-submit-spark-job-to-yarn-from-java-code.md class is getting accepted but it is not moving to a running state. and getting failed by throwing an error.

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

Application application_1493671618562_0072 failed 5 times due to AM Container for appattempt_1493671618562_0072_000005 exited with exitCode: 1 For more detailed output, check the application tracking page: http://headnode.internal.cloudapp.net:8088/cluster/app/application_1493671618562_0072 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e02_1493671618562_0072_05_000001 Exit code: 1 Exception message: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution Stack trace: ExitCodeException exitCode=1: /mnt/resource/hadoop/yarn/local/usercache/helixuser/appcache/application_1493671618562_0072/container_e02_1493671618562_0072_05_000001/launch_container.sh: line 26: $PWD:$PWD/spark_conf:$PWD/spark.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/:/usr/hdp/current/hadoop-client/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution at org.apache.hadoop.util.Shell.runCommand(Shell.java:933) at org.apache.hadoop.util.Shell.run(Shell.java:844) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application.

Thank you for your help.

Thanks, Ankush Reddy.

mahmoudparsian commented 7 years ago

Please provide more details: your script, its log and error messages.

ankushreddy commented 7 years ago

Hi @mahmoudparsian

this is the logs.

Log Type: directory.info Log Upload Time: Fri May Log Length: 5492 ls -l: total 36 lrwxrwxrwx 1 yarn hadoop -rw-r--r-- 1 yarn hadoop -rwx------ 1 yarn hadoop -rwx------ 1 yarn hadoop -rwx------ 1 yarn hadoop 6433 May lrwxrwxrwx 1 yarn hadoop lrwxrwxrwx 1 yarn hadoop drwx--x--- 2 yarn hadoop 4096 May find -L . -maxdepth 5 -ls: 3933556 4 drwx--x--- 3 yarn 3933558 4 drwx--x--- 2 yarn 3933562 4 -rw-r--r-- 1 yarn 3933517 185944 -r-x------ 1 yarn 3933564 4 -rw-r--r-- 1 yarn 3933518 4 drwx------ 2 yarn 3933548 4 -r-x------ 1 yarn 3933543 4 -r-x------ 1 yarn 3933541 4 -r-x------ 1 yarn 3933520 4 -r-x------ 1 yarn 3933526 4 -r-x------ 1 yarn 3933536 4 -r-x------ 1 yarn 3933519 8 -r-x------ 1 yarn 3933531 4 -r-x------ 1 yarn 3933547 8 -r-x------ 1 yarn 3933528 4 -r-x------ 1 yarn 3933544 4 -r-x------ 1 yarn 3933549 4 -r-x------ 1 yarn 3933523 4 -r-x------ 1 yarn 3933535 4 -r-x------ 1 yarn 3933525 24 -r-x------ 1 yarn 3933529 4 -r-x------ 1 yarn 3933538 4 -r-x------ 1 yarn 3933534 12 -r-x------ 1 yarn 3933533 8 -r-x------ 1 yarn 3933532 4 -r-x------ 1 yarn 3933530 4 -r-x------ 1 yarn 3933545 4 -r-x------ 1 yarn 3933527 8 -r-x------ 1 yarn 3933522 8 -r-x------ 1 yarn 3933542 4 -r-x------ 1 yarn 3933540 4 -r-x------ 1 yarn 3933537 8 -r-x------ 1 yarn 3933521 8 -r-x------ 1 yarn 3933546 4 -r-x------ 1 yarn 3933539 4 -r-x------ 1 yarn 3932820 135852 -r-xr-xr-x 1 yarn 3933566 4 -rw-r--r-- 1 yarn 3933563 4 -rwx------ 1 yarn 3933559 4 -rw-r--r-- 1 yarn 3933565 4 -rwx------ 1 yarn 3933560 4 -rw-r--r-- 1 yarn 3933561 8 -rwx------ 1 yarn broken symlinks(find -L . 05 06:03:26 +0000 2017 95 May 5 06:03 app.jar -> /mnt/resource/hadoop/yarn/local/filecache/10/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar 74 May 5 06:03 container_tokens 710 May 5 06:03 default_container_executor_session.sh 764 May 5 06:03 default_container_executor.sh 5 06:03 launch_container.sh 102 May 5 06:03 spark_conf -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/80/spark_conf6125877397366945561.zip 125 May 5 06:03 spark.jar -> /mnt/resource/hadoop/yarn/local/usercache/helixuser/filecache/81/spark-assembly-1.6.3.2.5.4.0-121-hadoop2.7.3.2.5.4.0-121.jar 5 06:03 tmp hadoop 4096 May 5 06:03 . hadoop 4096 May 5 06:03 ./tmp hadoop 60 May 5 06:03 ./.launch_container.sh.crc hadoop 190402950 May 5 06:03 ./spark.jar hadoop 16 May 5 06:03 ./.default_container_executor_session.sh.crc hadoop 4096 May 5 06:03 ./spark_conf hadoop 945 May 5 06:03 ./spark_conf/taskcontroller.cfg hadoop 249 May 5 06:03 ./spark_conf/slaves hadoop 2316 May 5 06:03 ./spark_conf/ssl-client.xml.example hadoop 1734 May 5 06:03 ./spark_conf/log4j.properties hadoop 265 May 5 06:03 ./spark_conf/hadoop-metrics2-azure-file-system.properties hadoop 1045 May 5 06:03 ./spark_conf/container-executor.cfg hadoop 5685 May 5 06:03 ./spark_conf/hadoop-env.sh hadoop 2358 May 5 06:03 ./spark_conf__/topology_script.py hadoop 4113 May 5 06:03 ./spark_conf/mapred-queues.xml.template hadoop 744 May 5 06:03 ./spark_conf/ssl-client.xml hadoop 417 May 5 06:03 ./spark_conf/topology_mappings.data hadoop 342 May 5 06:03 ./__spark_conf/spark_conf.properties hadoop 247 May 5 06:03 ./spark_conf/hadoop-metrics2-adl-file-system.properties hadoop 1020 May 5 06:03 ./spark_conf/commons-logging.properties hadoop 22138 May 5 06:03 ./spark_conf/yarn-site.xml hadoop 2450 May 5 06:03 ./spark_conf/capacity-scheduler.xml hadoop 2490 May 5 06:03 ./spark_conf/hadoop-metrics.properties hadoop 8754 May 5 06:03 ./spark_conf/hdfs-site.xml hadoop 4261 May 5 06:03 ./spark_conf/yarn-env.sh hadoop 1335 May 5 06:03 ./spark_conf/configuration.xsl hadoop 758 May 5 06:03 ./spark_conf/mapred-site.xml.template hadoop 1000 May 5 06:03 ./spark_conf/ssl-server.xml hadoop 4680 May 5 06:03 ./spark_conf/core-site.xml hadoop 5783 May 5 06:03 ./spark_conf/hadoop-metrics2.properties hadoop 1308 May 5 06:03 ./spark_conf/hadoop-policy.xml hadoop 1602 May 5 06:03 ./spark_conf__/health_check hadoop 4221 May 5 06:03 ./spark_conf/task-log4j.properties hadoop 7596 May 5 06:03 ./spark_conf/mapred-site.xml hadoop 2697 May 5 06:03 ./spark_conf/ssl-server.xml.example hadoop 752 May 5 06:03 ./spark_conf/mapred-env.sh hadoop 139105807 May 4 22:53 ./app__.jar hadoop 16 May 5 06:03 ./.default_container_executor.sh.crc hadoop 710 May 5 06:03 ./default_container_executor_session.sh hadoop 74 May 5 06:03 ./container_tokens hadoop 764 May 5 06:03 ./default_container_executor.sh hadoop 12 May 5 06:03 ./.container_tokens.crc hadoop 6433 May 5 06:03 ./launch_container.sh -maxdepth 5 -type l -ls):

this is how my project structure is.

spark-application

--> scala1 class // am calling the java class from this class.

--> java class // this will submit another spark application to the yarn cluster.

Another spark-application

--> scala2 class

if am invoking a java class from scala1.class by using spark-submit in the local spark-submit --class scala2.class is getting triggered and working good.

if am invoking a java class from scala1.class by using spark-submit in yarn spark-submit --class scala2.class is getting triggered and facing the issue or error.

import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkException;

public class CallingSparkJob {

        public void submitJob(String latestreceivedpitrL,String newPtr) throws Exception {
           System.out.println("In submit job method");
            try{
            System.out.println("Building a spark command");

   // prepare arguments to be passed to 
   // org.apache.spark.deploy.yarn.Client object
   String[] args = new String[] {
       // the name of your application
"--name",
"name",
// "--master",
    // "yarn",
      //    "--deploy-mode",
    //  "cluster",           

          //       "--conf", "spark.yarn.executor.memoryOverhead=600", "--conf", 
        "spark.yarn.submit.waitAppCompletion=false",

       // memory for driver (optional)
       "--driver-memory",
       "1000M",

       "--num-executors",
       "2",
       "--executor-cores",
       "2",

       // path to your application's JAR file 
       // required in yarn-cluster mode      
       "--jar",
   "wasb://storage_account_container@storageaccount.blob.core.windows.net/user/ankushuser/sparkfiller/sparkfiller-1.0-SNAPSHOT-jar-with-dependencies.jar",
       // name of your application's main class (required)
       "--class",
       "com.test.SparkFiller",

       // comma separated list of local jars that want 
       // SparkContext.addJar to work with      
      // "--addJars",
      // "/Users/mparsian/zmp/github/data-algorithms-book/lib/spark-assembly-1.5.2-hadoop2.6.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/log4j-1.2.17.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/junit-4.12-beta-2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jsch-0.1.42.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/JeraAntTasks.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jedis-2.5.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/jblas-1.2.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/hamcrest-all-1.3.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/guava-18.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math3-3.0.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-math-2.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-logging-1.1.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang3-3.4.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-lang-2.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-io-2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-httpclient-3.0.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-daemon-1.0.5.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-configuration-1.6.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-collections-3.2.1.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/commons-cli-1.2.jar,/Users/mparsian/zmp/github/data-algorithms-book/lib/cloud9-1.3.2.jar",

       // argument 1 for latestreceivedpitrL
    "--arg",
       latestreceivedpitrL,

       // argument 2 for newPtr
     "--arg",
       newPtr,

"--arg",
"yarn-cluster"

   };

  System.out.println("create a Hadoop Configuration object");

// create a Hadoop Configuration object
   Configuration config = new Configuration();

   // identify that you will be using Spark as YARN mode
  System.setProperty("SPARK_YARN_MODE", "true");

   // create an instance of SparkConf object
   SparkConf sparkConf = new SparkConf();
sparkConf.setSparkHome("/usr/hdp/current/spark-client");
    // sparkConf.setMaster("yarn");
    sparkConf.setMaster("yarn-cluster");

   // sparkConf.setAppName("spark-yarn");
   //  sparkConf.set("master", "yarn");

    // sparkConf.set("spark.submit.deployMode", "cluster"); // worked

   // create ClientArguments, which will be passed to Client
   // ClientArguments cArgs = new ClientArguments(args);
   ClientArguments cArgs = new ClientArguments(args, sparkConf);

   // create an instance of yarn Client client
   Client client = new Client(cArgs, config, sparkConf); 

   // submit Spark job to YARN
   client.run(); 
   }catch(Exception e){

       System.out.println("Error submitting spark Job");

       System.out.println(e.getMessage());
   }

  }

 }

this is spark submit command am using.

spark-submit --class scala1 --master yarn --deploy-mode cluster --num-executors 2 --executor-cores 2 --conf spark.yarn.executor.memoryOverhead=600 --conf spark.yarn.submit.waitAppCompletion=false /home/ankushuser/kafka_retry/kafka_retry_test/sparkflightaware/target/sparkflightaware-0.0.1-SNAPSHOT-jar-with-dependencies.jar

if I run this spark-submit command locally it is invoking the java class and the spark-submit command for scala2 application is working good.

If I run it in yarn then am facing the issue.

mahmoudparsian commented 7 years ago

Can you also please include error messages you are getting? If possible, include your Scala classes (all of your classes) as well so that I can try it on my side.

Thanks, Mahmoud

ankushreddy commented 7 years ago

Am getting the above error message alone. I have about 10 to 20 classes on which am using in the application. it will basically pull the data from an api and push it to kafka topic. in this application am invoking a filler job to push the records to document db.

Thanks, Ankush Reddy.

JackDavidson commented 6 years ago

Same issue here. Everything submits without any warnings or errors, but then nothing happens and the yarn logs report 'Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster' on two nodes in the cluster.(nothing else on any other node) I have instead resorted to invoking spark-submit from my java code, which is working.

ankushreddy commented 6 years ago

@JackDavidson hi did you copy the jar files to worker nodes? our existing spark application or the command that we are running might not run on the head node itself.

Work around use livy and store your jar war in a hdfs or any location. you can use it to post the spark-submit command.

JackDavidson commented 6 years ago

@ankushreddy My project's jar file (there is only one, and it is a fat jar) is stored in HDFS. I haven't ever done any manual copying of jars anywhere. This works with spark-submit, but maybe submitting from java is different in some way? The jar I am submitting does not contain the class that spark cant find though. The spark libraries under spark home do contain that class though. But I wonder, if I add a dependency on spark-yarn in my submitted fat jar, would spark then be able to find the missing class?

Livy sounds like a great alternative. I'll start looking into that

ankushreddy commented 6 years ago

@JackDavidson in our case we included all the dependencies along with the jar so we didn't face any issues with missing dependencies.

I would suggest you to look at livy in most of the cases that should solve the problem.

mahmoudparsian / data-algorithms-book

Hi am facing an issue with submit a job from java. #16