I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.
SO my question is:
How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?
PS: my mapper's output is (key:string of python, value:array of numpy) .
And my .sh file:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar
-D mapred.reduce.tasks=2
-fs local
-jt local
-io typedbytes
-inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat
-input FFT_SequenceFile
-output pinvoutput
-mapper 'pinvmap.py'
-file pinvmap.py
I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.
SO my question is: How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?
PS: my mapper's output is (key:string of python, value:array of numpy) . And my .sh file: hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=2
-fs local -jt local -io typedbytes -inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat -input FFT_SequenceFile -output pinvoutput -mapper 'pinvmap.py' -file pinvmap.py