Open ScotterC opened 11 years ago
I believe this is related to https://github.com/infochimps-labs/wukong/issues/11.
Wukong::Hadoop::HadoopInvocation#ruby_interpreter_path would appear to be setting the right ruby path but I can't find where that method is called.
The only other clue I have is that my stderr job logs also show Unable to load realm info from SCDynamicStore
which I believe was fixed with the right export in hadoop-env.sh and has stopped showing in my stdout so maybe the hadoop being loaded is loading a different environment.
I'm starting to become a bit more convinced that the issue is lying somewhere within how wu-hadoop is loading the environment. I wrote some straight ruby scripts and ran it through hadoop streaming
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/contrib/streaming/hadoop-*streaming*.jar \
-file /Users/ScotterC/disco/hadoop-ruby/mapper.rb -mapper /Users/ScotterC/disco/hadoop-ruby/mapper.rb \
-file /Users/ScotterC/disco/hadoop-ruby/reducer.rb -reducer /Users/ScotterC/disco/hadoop-ruby/reducer.rb \
-input sonnet_18.txt -output word_count_ruby.tsv
These worked fine so I'm guessing that the gemfile is somehow not getting loaded or getting unset somehow with normal wu-hadoop.
BTW, sorry for the stream of conscious type issue posting but I'm hoping this will be useful to others who are having the same issues and will be useful google fodder.
Are you running on the 1.0-ish branch of Hadoop (chd3 / 0.20 etc) or the 2.0ish branch (cdh4)?
Does Hadoop streaming work for you at all?
What do the log files from the child process say (you get those by clicking through the tasks on the job tracker ui, or by drilling into the no -world readable dirs in /var/log/hadoop)
Sent from my iPad
On Feb 7, 2013, at 11:25 AM, Scott Carleton notifications@github.com wrote:
Not sure if this a full fledged issue but I'm posting it here because the google group doesn't appear to be very active.
In short, the examples work fine in local mode but not hadoop mode.
This works as expected:
wu-hadoop examples/word_count.rb --mode=local --input=examples/sonnet_18.txt However, when I switch it over to the single node hadoop cluster which is running locally it fails.
I've put sonnet_18.txt into the hdfs and normal hadoop jar examples such as pi calculation works fine.
Command:
wu-hadoop examples/word_count.rb --mode=hadoop --input=/user/ScotterC/sonnet_18.txt --output=/user/ScotterC/word_count.tsv --rm I get
Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. and the job details prints:
java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249)
This is with hadoop 1.1.1, wukong 3.0.0.pre3 and wukong-hadoop 0.0.2
If anyone has pointers for debugging Java it would be highly appreciated.
— Reply to this email directly or view it on GitHub.
Just figured it out! Of course in the end it was simple.
Using 1.1.1 hadoop. I had [[ -s "/Users/ScotterC/.rvm/scripts/rvm" ]] && source "/Users/ScotterC/.rvm/scripts/rvm"
in my hadoop-env.sh but I guess I must have not restarted hadoop after doing that. The other factor was needing a gemfile in the folder so that wukong would pick it up and set cmdenv
.
After adding the rvm script to the end of my hadoop bash environment it appears to be working. What are your thoughts on throwing a warning if the gemfile isn't there? I guess it wouldn't make sense if wukong was installed globally. Not sure if you guys use rvm much but I'd be happy to add a thought to the docs saying 'if you're using rvm, make sure to add it to hadoop env and restart the cluster.'
tl;dr:
Bundler.setup(...)
, where ...
is the minimal set of Gemfile groups for script-running.-file
functionality is dangerous and should be reverted out.The -file
stuff is really dangerous, and am going to check as to why it was made the default. It will only work for an utterly trivial example, and fail mysteriously otherwise.
Design parameters as I see them:
-file
flagThe only two workable solutions I'm aware of are:
Specifying -file
fails as soon as anything besides my script is launched (in this case, the Gemfile, but any file I required, or data file I load, or local checkout of a library, or...).
@dhruvbansal would you please revert out that behavior?
Thanks for the write up. Since I was quite new to map/reduce et al, I wanted to get it all working locally on my macbook before moving into a distributed setting but of course that makes dealing with the idiosyncrasies of my work station a pain as opposed to using chef recipes and getting it right the first time in a controlled environment. However I did learn a lot the hard way :)
If it's your intention to allow more developers to get up and running with wukong quickly then designing/implementing an opinionated packer could be a priority, or possibly just some very very comprehensive tutorial docs. Alternative could be a short doc on how to use ironfan to setup a vagrant instance so it would at least be a controlled environment. However, considering all the work involved in just making an awesome ruby wrapper to quickly code data flows and deploy them to a cluster, making sure it works perfectly in a local hadoop cluster should probably not be a priority.
@mrflip I'd love to have the opinionated packager option working but at present only the network drive approach is really feasible. wu-hadoop
should work "as expected" if
-file
functionality supports this. Multiple files and code loading are brittle because the default Hadoop "packager" does some insane path munging that is difficult to work around (see @kornypoet's complaints...).@ScotterC did one of the above "supported" use-cases not work for you?
I'd like a more robust model, but this is what we've got today. Agreed with @ScotterC that these constraints deserve better documentation.
@dhruvbansal I guess the supported case that ended up working for me was in hadoop mode without an NFS and munging the path loading to make sure everything needed was there. Just was very confusing due to the finicky nature of hadoop configuration and fully comprehending how it loads up. I believe the process will be much smoother when setup over a network.
Local mode is a breeze however which is really the strength of the library which allows you to test code before putting it out there and wasting compute cycles. I'm now going to proceed towards deploying a deploy-pack with ironfan which appears to be fairly well documented here. I'll let you know if I run into difficulties.
Not sure if this a full fledged issue but I'm posting it here because the google group doesn't appear to be very active.
In short, the examples work fine in local mode but not hadoop mode.
This works as expected:
However, when I switch it over to the single node hadoop cluster which is running locally it fails.
I've put sonnet_18.txt into the hdfs and normal hadoop jar examples such as pi calculation works fine.
Command:
I get
and the job details prints:
This is with hadoop 1.1.1, wukong 3.0.0.pre3 and wukong-hadoop 0.0.2
If anyone has pointers for debugging Java it would be highly appreciated.
1st EDIT:
Found that further down the stack trace there is:
I'm using RVM so my wu-local is located
/Users/ScotterC/.rvm/gems/ruby-1.9.3-p194/bin/wu-local
Writing out full paths for wu-local got me to my next error
Error:
2nd Edit:
Digging through Hadoop job logs gives me
env: ruby_noexec_wrapper: No such file or directory
So I'm wondering why it can't find my ruby implementation. I would assume that's giving the subprocess failed.