Mapreduce Example Fails

pooleja commented 10 years ago

I have been trying get the Mapreduce example to run against a cluster with the following version:

Hadoop 2.2.0.2.0.6.0-101

I built the example with the following command:

mvn clean package -Phadoop22

The first error I encountered was seen on the logs for the Map task:

java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

To get around this error I added the following config item to the application-context.xml for the config object:

yarn.application.classpath=$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*

Now the job gets submitted and starts to execute properly, however it only succeeds if the task is executed on the same node that Resource Manager is running on. It is a 3 node cluster, where Node1 has Resource Manager. When any of the jobs get submitted to Node2 or Node3, it will fail with (repeating):

2014-02-06 12:05:52,135 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

If I run a sample map/reduce job outside of spring hadoop it executes as expected on any of the nodes, so I don't think it is a problem with the setup. It seems like the Spring Hadoop libraries are picking up a setting where the task thinks the Resource Manager is installed on the local Node.

Please let me know if you have any suggestions.

trisberg commented 10 years ago

Could you try setting this property in your config:

yarn.resourcemanager.hostname

Set that to the hostname where the RM is running.

pooleja commented 10 years ago

Yep, that was it. Thanks!

Is there any guidance from the Spring Hadoop team on the best way to ensure all the properties are correct? For example, would it make sense to copy a yarn-site.xm and mapred-site.xml to the client machine and pull it into the spring config? Or could that cause other types of problems?

trisberg commented 10 years ago

I've used both ways for providing config options and the net effect is the same, so pick the one that you are more comfortable using. I tend to prefer to collect my configurations in a properties file that's part of my application.

SaiPrasannaAnnamalai commented 10 years ago

Hi, Same problem...but coudnt solve....please help...i have a 5 node cluster with one master and 4 slaves. I have set the ip-address of the master node [in fact even tried hard-coding] for 'yarn.resourcemanager.hostname' in the yarn-site.xml file. But even then i get the following in the log files. ERROR:........Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 2014-03-05 20:15:50,597 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. 2014-03-05 20:15:50,603 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030 2014-03-05 20:15:56,632 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

What could be the reason. Why is not hadoop picking up the parameter that i had set...???

spring-attic / spring-hadoop-samples

Mapreduce Example Fails #6