Closed brainstorm closed 13 years ago
The last "Job not successful failure" is closely related to the following post according to the hadoop job log:
http://www.curiousattemptbunny.com/2009/10/hadoop-streaming-javalangruntimeexcepti.html
Smells like virtualenv is causing trouble when running dumbo (cannot find the right version of python on the worker nodes ?).
I've tried hardcoding the sh-bang as the post suggests but didn't help :-S
The actual Hadoop exception is different from the one on the post:
ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201102242242_0018_r_000000" TASK_ATTEMPT_ID="attempt_201102242242_0018_r_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1298749520679" HOSTNAME="$HOST" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 2 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:362) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:572) at org\.apache\.hadoop\.streaming\.PipeReducer\.close(PipeReducer\.java:137) at org\.apache\.hadoop\.mapred\.ReduceTask\.runOldReducer(ReduceTask\.java:478) at org\.apache\.hadoop\.mapred\.ReduceTask\.run(ReduceTask\.java:416) at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:240) at java\.security\.AccessController\.doPrivileged(Native Method) at javax\.security\.auth\.Subject\.doAs(Subject\.java:396) at org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1115) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:234)
I am running this clustered hadoop environment without root privileges (just with a regular user).
The problem really is the way in which you install Dumbo -- it has to be installed as an egg (that hasn't been unzipped into a directory). Commenting out the fileopt stuff hides the symptoms somewhat but it definitely won't fix anything, it actually makes things worse even.
When you start a Dumbo job, Dumbo will send itself along with the job by using the option "-file path_to_egg" internally, which won't work when it's not installed as an egg or when you disable the -file option (but the latter might indeed lead to less explicit errors, as you discovered).
Thanks Indeed ! I just "python setup.py install" to generate an egg and works without commenting the code, but fails the same way on the hadoop side:
ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201102262100_0002_r_000000" TASK_ATTEMPT_ID="attempt_201102262100_0002_r_000000_0" TASK_STATUS="FAILED" FINISH_TIME="1298885499654" HOSTNAME="$HOSTNAME" ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads(): subprocess failed with code 2 at org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:362) at org\.apache\.hadoop\.streaming\.PipeMapRed\.mapRedFinished(PipeMapRed\.java:572) at org\.apache\.hadoop\.streaming\.PipeReducer\.close(PipeReducer\.java:137) at org\.apache\.hadoop\.mapred\.ReduceTask\.runOldReducer(ReduceTask\.java:478) at org\.apache\.hadoop\.mapred\.ReduceTask\.run(ReduceTask\.java:416) at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:240) at java\.security\.AccessController\.doPrivileged(Native Method) at javax\.security\.auth\.Subject\.doAs(Subject\.java:396) at org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1115) at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:234) " .
Any ideas why ? Other hadoop examples (pi estimator) work fine :-S
Sounds like a bug in your Dumbo script. The Hadoop Java exceptions are rarely useful in that case, you need to check the stderr logs instead (in webui, click on jobid -> failed tasks number -> last 4KB (under "logs")).
Yes, here it is:
stderr logs /usr/bin/python: module ipcount not found java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) (...)
I'm running dumbo as the tutorial states:
dumbo start ipcount.py -hadoop $HADOOP_HOME -input access.log -output ipcounts
Must be something with my python virtual environment not being able to import ipcount.py and dumbo egg then ? Is ipcount.py supposed to bundled in the hadoop job somehow as the dumbo egg ?
The ipcount.py script should be submitted along with the job as well (by adding "-file ipcount.py" under the hood). Are you sure you enabled all of the fileopt code again?
I removed every dumbo/typedbytes file/lib lying around on site-packages and re-installed the egg via python install (rolling back on the commented lines), and it seems that the eggs and "ipcount.py" are passed to the job:
(...) -cmdenv 'PYTHONPATH=dumbo-0.21.30-py2.6.egg:typedbytes-0.3.6-py2.6.egg' -file 'PATH_TO/ipcount.py' -file 'PATH_TO.virtualenv/devel/lib/python2.6/site-packages/dumbo-0.21.30-py2.6.egg' -file 'PATH_TO.virtualenv/devel/lib/python2.6/site-packages/typedbytes-0.3.6-py2.6.egg'
Same result though:
/usr/bin/python: module ipcount not found
Tried to hardcode the sh-bang as the post suggests to my virtualenv's python:
PATH_TO/.virtualenvs/devel/bin/python
But same effect on the Hadoop job:
/usr/bin/python: module ipcount not found
:-(
Thanks for your support !
I've been trying to adjust sys.path inside ipcount.py but it still cannot find ipcount(.py) file when running:
2011-03-01 13:26:07,241 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/python, -m, ipcount, red, 0, 262144000]
Any further ideas ?
stderr logs /usr/bin/python: module ipcount not found java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) (...)
Now I tried to pass "-pypath" and "-python" flags explicitly to dumbo:
$ dumbo start ipcount.py -hadoop $HADOOP_HOME -input access.log -output ipcounts -pypath '.:path/to/.virtualenvs/devel/lib/python2.6/site-packages' -python '/path/to/.virtualenvs/devel/bin/python'
The "." on pypath allows dumbo to find the ipcounts "module".
But now the error refers to the python importer:
'import site' failed; use -v for traceback Could not import runpy module
I added the -v on dumbo/backends/common.py but I couldn't see clear clues on why "site" does not get imported correctly...
Did you manage to have dumbo flying on virtualenv and -hadoop mode ? From your post, it seems that this is only tested on local mode:
http://dumbotics.com/2009/05/24/virtual-pythonenvironments/
What am I doing wrong ? :-S
Moved the issue to dumbo-user mailing list:
http://groups.google.com/group/dumbo-user/t/c9d368625daa2629
I fixed this issue by the following patch
--- a/dumbo/backends/streaming.py
+++ b/dumbo/backends/streaming.py
@@ -76,7 +76,7 @@ class StreamingIteration(Iteration):
if modpath.endswith('.egg'):
addedopts.add('libegg', modpath)
else:
- opts.add('file', modpath)
+ opts.add('file', 'file://' + modpath)
opts.add('jobconf', 'stream.map.input=typedbytes')
opts.add('jobconf', 'stream.reduce.input=typedbytes')
Hello,
I'm having the same issue described in this thread, but with the more recent CDH3B4:
http://groups.google.com/group/dumbo-user/browse_thread/thread/d5440880a5588278
Namely:
easy_install does not seem to be the reason. I've installed it via pip, easy_install, and now git clone. Seems to me that the jobconf 'tmpfiles' is what causes the problem.
Commenting the offending code allows the mapreduce job to start but fails shortly after, on the hadoop side (not dumbo):
The result of the above modification is:
11/02/26 20:23:32 INFO streaming.StreamJob: map 0% reduce 0% 11/02/26 20:23:40 INFO streaming.StreamJob: map 50% reduce 0% 11/02/26 20:23:41 INFO streaming.StreamJob: map 100% reduce 0% 11/02/26 20:24:02 INFO streaming.StreamJob: map 100% reduce 17% 11/02/26 20:24:06 INFO streaming.StreamJob: map 100% reduce 0% 11/02/26 20:24:13 INFO streaming.StreamJob: map 100% reduce 33% 11/02/26 20:24:17 INFO streaming.StreamJob: map 100% reduce 0% 11/02/26 20:24:26 INFO streaming.StreamJob: map 100% reduce 17% 11/02/26 20:24:29 INFO streaming.StreamJob: map 100% reduce 0% 11/02/26 20:24:32 INFO streaming.StreamJob: map 100% reduce 100% (...)
11/02/26 20:24:32 ERROR streaming.StreamJob: Job not successful. Error: NA