infochimps-labs / wukong-hadoop

Execute Wukong code within the Hadoop framework.
Other
13 stars 6 forks source link

require <filename> gives Error: # of failed Map Tasks exceeded allowed limit #3

Open sagarlekar opened 11 years ago

sagarlekar commented 11 years ago

Hello Team Wukong-Hadoop,

I am experimenting with Wukong-Hadoop. Whenever I use require <./lib/facebook.rb> in my wukong script (the file which contains the mapper and reducer processors) I get the ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201308081248_0020_m_000000

The script works fine in local mode but not in hadoop mode.

I am using ruby 1.9.3p392, hadoop 1.1.2

I am new to Hadoop and Wukong. Am I doing something obviously wrong?

Kindly help.

Sagar

dhruvbansal commented 11 years ago

It sounds like something is different between some of the machines in your Hadoop cluster and your local environment where you are able to run your code without a problem. Every time a "Map Task" tries to run your code it is hitting some error and failing. This has happened so many times that Hadoop has now failed your job. You should look at the logs for each map task which has failed (they're available by drilling down through the jobtracker UI) and see what is showing up. Is Ruby complaining about a missing Ruby gem? Is there some path you're leaving relative when it should have been absolute? The logs will tell you.

Another thing to try is using an NFS on your Hadoop cluster and doing bundle install --standalone or similar to ensure that all machines have all their Ruby dependencies automatically satisfied by working within that local bundle on the shared NFS.

You can also look at wukong-deploy which captures some of these patterns.

sagarlekar commented 11 years ago

Thanks for the hint Dhruv. I have resolved this.

I used absolute path in require. Works well for now.