peterknife / boto

Automatically exported from code.google.com/p/boto
0 stars 0 forks source link

Problem trying to create a map-only job #486

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a StreamingStep object with argument reducer=None
2. Run the job...
3. Apparently boto is telling Hadoop to execute a scipt called "None" as the 
reducer instead of removing the reducer step

What is the expected output? What do you see instead?
I would expect to create a map-only job. The proper parameters should be sent 
to hadoop so no reducer step is performed.

What version of the product are you using? On what operating system?
2.0b1 and 2.0b3

Please provide any additional information below.
First of all, the documentation on the boto.cloudhackers.com website says that 
StreamingStep is declared with reducer=None. That contradicts what we see in 
the internal help from the library, where reducer appears as a mandatory 
argument. If this is the case, and setting reducer=None in the object 
declaration is not the way to do it, what should I do then?

It seems that in Hadoop itself the correct way to create a map-only job is to 
set the number of reducer instances to zero... Is tehre any way I could do that 
with the StreamingStep object?...

Here is part of a log file from an execution of a job with reducer=None:

2011-01-05 03:33:04,431 INFO org.apache.hadoop.metrics.jvm.JvmMetrics (main): 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2011-01-05 03:33:04,530 INFO org.apache.hadoop.mapred.ReduceTask (main): Host 
name: ip-10-100-182-182.ec2.internal
2011-01-05 03:33:04,593 INFO org.apache.hadoop.streaming.PipeMapRed (main): 
PipeMapRed exec [None]
2011-01-05 03:33:04,623 ERROR org.apache.hadoop.streaming.PipeMapRed (main): 
configuration exception
java.io.IOException: Cannot run program "None": java.io.IOException: error=2, 
No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:237)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:243)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or 
directory
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 5 more

Original issue reported on code.google.com by nwerneck@gmail.com on 5 Jan 2011 at 4:08

GoogleCodeExporter commented 9 years ago
I'm afraid I'm not actively using EMR so I'm not sure how best to address this. 
 I think it would be best to get some feedback from the boto community on this. 
 I'll post a note on boto-users.

Original comment by Mitch.Ga...@gmail.com on 6 Jan 2011 at 12:44