infochimps-labs / wukong-hadoop

Execute Wukong code within the Hadoop framework.
Other
13 stars 6 forks source link

local mode works but not hadoop mode #2

Open ScotterC opened 11 years ago

ScotterC commented 11 years ago

Not sure if this a full fledged issue but I'm posting it here because the google group doesn't appear to be very active.

In short, the examples work fine in local mode but not hadoop mode.

This works as expected:

wu-hadoop examples/word_count.rb --mode=local --input=examples/sonnet_18.txt

However, when I switch it over to the single node hadoop cluster which is running locally it fails.

I've put sonnet_18.txt into the hdfs and normal hadoop jar examples such as pi calculation works fine.

Command:

wu-hadoop examples/word_count.rb --mode=hadoop --input=/user/ScotterC/sonnet_18.txt --output=/user/ScotterC/word_count.tsv --rm

I get

Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1.

and the job details prints:

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

This is with hadoop 1.1.1, wukong 3.0.0.pre3 and wukong-hadoop 0.0.2

If anyone has pointers for debugging Java it would be highly appreciated.

1st EDIT:

Found that further down the stack trace there is:

Caused by: java.io.IOException: Cannot run program "wu-local": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)

I'm using RVM so my wu-local is located /Users/ScotterC/.rvm/gems/ruby-1.9.3-p194/bin/wu-local

Writing out full paths for wu-local got me to my next error

wu-hadoop examples/word_count.rb --mode=hadoop --input=sonnet_18.txt --output=word_count.tsv --rm --map_command='/Users/ScotterC/.rvm/gems/ruby-1.9.3-p194/bin/wu-local /Users/ScotterC/code/github/wukong-hadoop/examples/word_count.rb --run=mapper' --reduce_command='/Users/ScotterC/.rvm/gems/ruby-1.9.3-p194/bin/wu-local /Users/ScotterC/code/github/wukong-hadoop/examples/word_count.rb --run=reducer'

Error:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

2nd Edit:

Digging through Hadoop job logs gives me

env: ruby_noexec_wrapper: No such file or directory

So I'm wondering why it can't find my ruby implementation. I would assume that's giving the subprocess failed.

ScotterC commented 11 years ago

I believe this is related to https://github.com/infochimps-labs/wukong/issues/11.

Wukong::Hadoop::HadoopInvocation#ruby_interpreter_path would appear to be setting the right ruby path but I can't find where that method is called.

The only other clue I have is that my stderr job logs also show Unable to load realm info from SCDynamicStore which I believe was fixed with the right export in hadoop-env.sh and has stopped showing in my stdout so maybe the hadoop being loaded is loading a different environment.

ScotterC commented 11 years ago

I'm starting to become a bit more convinced that the issue is lying somewhere within how wu-hadoop is loading the environment. I wrote some straight ruby scripts and ran it through hadoop streaming

hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/contrib/streaming/hadoop-*streaming*.jar \
  -file /Users/ScotterC/disco/hadoop-ruby/mapper.rb    -mapper /Users/ScotterC/disco/hadoop-ruby/mapper.rb \
  -file /Users/ScotterC/disco/hadoop-ruby/reducer.rb   -reducer /Users/ScotterC/disco/hadoop-ruby/reducer.rb \
  -input sonnet_18.txt -output word_count_ruby.tsv

These worked fine so I'm guessing that the gemfile is somehow not getting loaded or getting unset somehow with normal wu-hadoop.

BTW, sorry for the stream of conscious type issue posting but I'm hoping this will be useful to others who are having the same issues and will be useful google fodder.

mrflip commented 11 years ago

Are you running on the 1.0-ish branch of Hadoop (chd3 / 0.20 etc) or the 2.0ish branch (cdh4)?

Does Hadoop streaming work for you at all?

What do the log files from the child process say (you get those by clicking through the tasks on the job tracker ui, or by drilling into the no -world readable dirs in /var/log/hadoop)

Sent from my iPad

On Feb 7, 2013, at 11:25 AM, Scott Carleton notifications@github.com wrote:

Not sure if this a full fledged issue but I'm posting it here because the google group doesn't appear to be very active.

In short, the examples work fine in local mode but not hadoop mode.

This works as expected:

wu-hadoop examples/word_count.rb --mode=local --input=examples/sonnet_18.txt However, when I switch it over to the single node hadoop cluster which is running locally it fails.

I've put sonnet_18.txt into the hdfs and normal hadoop jar examples such as pi calculation works fine.

Command:

wu-hadoop examples/word_count.rb --mode=hadoop --input=/user/ScotterC/sonnet_18.txt --output=/user/ScotterC/word_count.tsv --rm I get

Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. and the job details prints:

java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249)

This is with hadoop 1.1.1, wukong 3.0.0.pre3 and wukong-hadoop 0.0.2

If anyone has pointers for debugging Java it would be highly appreciated.

— Reply to this email directly or view it on GitHub.

ScotterC commented 11 years ago

Just figured it out! Of course in the end it was simple.

Using 1.1.1 hadoop. I had [[ -s "/Users/ScotterC/.rvm/scripts/rvm" ]] && source "/Users/ScotterC/.rvm/scripts/rvm" in my hadoop-env.sh but I guess I must have not restarted hadoop after doing that. The other factor was needing a gemfile in the folder so that wukong would pick it up and set cmdenv.

After adding the rvm script to the end of my hadoop bash environment it appears to be working. What are your thoughts on throwing a warning if the gemfile isn't there? I guess it wouldn't make sense if wukong was installed globally. Not sure if you guys use rvm much but I'd be happy to add a thought to the docs saying 'if you're using rvm, make sure to add it to hadoop env and restart the cluster.'

mrflip commented 11 years ago

tl;dr:


The -file stuff is really dangerous, and am going to check as to why it was made the default. It will only work for an utterly trivial example, and fail mysteriously otherwise.

Design parameters as I see them:

  1. Somehow or another, you have to put all files the script will use in a place where the hadoop runner will see them:
    • in a fixed location on each computer, set up identically
    • in the job-local directory, by enumerating with the -file flag
    • I think you can also specify a jar file, using I-forget-what hadoop commandline flag, to be un-jar'ed
    • on a shared network drive eg NFS
  2. If you are developing a script:
    • you must not have to manually re-enumerate every inclusion for the runner to find. That is, I must be free to say "require_relative '../../crazy/unexpected/path/to/thing'" or "File.open(File.expand_path('../../data/table.tsv', FILE))" and not have to ALSO go list those paths in a separate manifest.
    • you must not have to manually synchronize things across computers in order to launch
    • you should not have to watch as a time-consuming package/upload/eventually unpackage step is added to each job launch
    • you should be allowed to have paths that lie outside the local app directory tree
  3. If you are running in production:
    • you must be confident that all and only what you expect to be running is running.

The only two workable solutions I'm aware of are:

Specifying -file fails as soon as anything besides my script is launched (in this case, the Gemfile, but any file I required, or data file I load, or local checkout of a library, or...).

@dhruvbansal would you please revert out that behavior?

ScotterC commented 11 years ago

Thanks for the write up. Since I was quite new to map/reduce et al, I wanted to get it all working locally on my macbook before moving into a distributed setting but of course that makes dealing with the idiosyncrasies of my work station a pain as opposed to using chef recipes and getting it right the first time in a controlled environment. However I did learn a lot the hard way :)

If it's your intention to allow more developers to get up and running with wukong quickly then designing/implementing an opinionated packer could be a priority, or possibly just some very very comprehensive tutorial docs. Alternative could be a short doc on how to use ironfan to setup a vagrant instance so it would at least be a controlled environment. However, considering all the work involved in just making an awesome ruby wrapper to quickly code data flows and deploy them to a cluster, making sure it works perfectly in a local hadoop cluster should probably not be a priority.

dhruvbansal commented 11 years ago

@mrflip I'd love to have the opinionated packager option working but at present only the network drive approach is really feasible. wu-hadoop should work "as expected" if

@ScotterC did one of the above "supported" use-cases not work for you?

I'd like a more robust model, but this is what we've got today. Agreed with @ScotterC that these constraints deserve better documentation.

ScotterC commented 11 years ago

@dhruvbansal I guess the supported case that ended up working for me was in hadoop mode without an NFS and munging the path loading to make sure everything needed was there. Just was very confusing due to the finicky nature of hadoop configuration and fully comprehending how it loads up. I believe the process will be much smoother when setup over a network.

Local mode is a breeze however which is really the strength of the library which allows you to test code before putting it out there and wasting compute cycles. I'm now going to proceed towards deploying a deploy-pack with ironfan which appears to be fairly well documented here. I'll let you know if I run into difficulties.