klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

Set reducer‘s numbers failed #88

Open kitein9t opened 9 years ago

kitein9t commented 9 years ago

I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.

SO my question is: How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?

PS: my mapper's output is (key:string of python, value:array of numpy) . And my .sh file: hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=2
-fs local -jt local -io typedbytes -inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat -input FFT_SequenceFile -output pinvoutput -mapper 'pinvmap.py' -file pinvmap.py