spring-attic / spring-hadoop-samples

Spring Hadoop Samples
Apache License 2.0
492 stars 466 forks source link

yarn-examples-simple-command failed on cdh5@centos #22

Closed chang-chao closed 9 years ago

chang-chao commented 9 years ago

I'm using cloudera cdh5.0 quick startvm for my development hadoop cluster. when I run yarn-examples-simple-command folloing the README.md,the application failed with the message below. Any ideas on this?

2014-10-20 18:24:46,286 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1413848891726_0001_02_000001 transitioned from LOCALIZED to RUNNING
2014-10-20 18:24:46,296 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/cloudera/appcache/application_1413848891726_0001/container_1413848891726_0001_02_000001/default_container_executor.sh]
2014-10-20 18:24:46,516 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1413848891726_0001_02_000001 is : 1
2014-10-20 18:24:46,516 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1413848891726_0001_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
        at org.apache.hadoop.util.Shell.run(Shell.java:424)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2014-10-20 18:24:46,516 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2014-10-20 18:24:46,516 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
jvalkeal commented 9 years ago

Might be something to do with build because samples are still on vanilla hadoop 2.2.0 and cdh5 might be newer. Did you found container logs(stdout and stderr logs for application master) which might tell more. You could also try our IO guides https://spring.io/guides?filter=yarn. Building Spring YARN Projects with Gradle/Maven guides have instructions howto build against different distros and cdh5 is one supported.

chang-chao commented 9 years ago

Finally,I made it work.(Though still there is something that I cannot understand)

There was ClassNotFound(log4j,org.apache.hadoop.conf.Configuration) exception in stderr(Appmaster.stderr) for application master log,So I commented out the "hadoopruntime.exclude" declaration in build.gradle,after that the app finished and succeeded.

By the way,the following should be noticed. 1.the following env viariables ,which seems to be used when master is started,are not set automatically when cdh is installed,I had to set them manually.

2.As with this cloudera issue:Container erases temporary file and shell script immediately after execution,you have to monitor file to get the log file.

chang-chao commented 9 years ago

After digging nearly two days,I think I found the reason why ClassNotFound(class in commons-logging lib) exception occured in the simple yarn app.

Firstly,we should definitely set the yarn.application.classpath value (through adding yarn-site.xml to the classpath of the client) according to cluster env in the app client .If not so,the default value(eg.$HADOOP_COMMON_HOME/share/hadoop/common/*,...) will be used,which is not the right path in many distros(at least CDH5)

Secondly,Don't use multiple-line values for "yarn.application.classpath" ,in yarn-site.xml,in my cluster (cdh5,java7,centos),classpath could not be fully and correctly parsed if we did so. unfortunately the yarn-site.xml in cdh5 cluster is containing the multi-line value,which I think is a bug.