When I run any dumbo script (I have release-0.21.28) with "-addpath yes" in the arguments, my map jobs fail with the following error: "KeyError: 'map_input_file'"
It appears that the environment variable map_input_file is no longer used in hadoop 0.21.0, and has been replaced with mapreduce_map_input_file.
This diagnosis is supported by a comment on HADOOP-5973 (https://issues.apache.org/jira/browse/HADOOP-5973) that mentions that map.input.file is only available in the older (deprecated) version of the MapReduce API in Hadoop 0.20.0.
I was able to make it work by replacing all instances of "map_input_file" with "mapreduce_map_input_file" in dumbo/core.py, but perhaps a longer-term solution would be to check both variables to see which one exists.
When I run any dumbo script (I have release-0.21.28) with "-addpath yes" in the arguments, my map jobs fail with the following error: "KeyError: 'map_input_file'"
It appears that the environment variable map_input_file is no longer used in hadoop 0.21.0, and has been replaced with mapreduce_map_input_file.
This diagnosis is supported by a comment on HADOOP-5973 (https://issues.apache.org/jira/browse/HADOOP-5973) that mentions that map.input.file is only available in the older (deprecated) version of the MapReduce API in Hadoop 0.20.0.
I was able to make it work by replacing all instances of "map_input_file" with "mapreduce_map_input_file" in dumbo/core.py, but perhaps a longer-term solution would be to check both variables to see which one exists.