klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

addpath broken under hadoop-0.21.0 #20

Closed jso closed 13 years ago

jso commented 14 years ago

When I run any dumbo script (I have release-0.21.28) with "-addpath yes" in the arguments, my map jobs fail with the following error: "KeyError: 'map_input_file'"

It appears that the environment variable map_input_file is no longer used in hadoop 0.21.0, and has been replaced with mapreduce_map_input_file.

This diagnosis is supported by a comment on HADOOP-5973 (https://issues.apache.org/jira/browse/HADOOP-5973) that mentions that map.input.file is only available in the older (deprecated) version of the MapReduce API in Hadoop 0.20.0.

I was able to make it work by replacing all instances of "map_input_file" with "mapreduce_map_input_file" in dumbo/core.py, but perhaps a longer-term solution would be to check both variables to see which one exists.

klbostee commented 13 years ago

fix incompatibilities with hadoop 0.20 (closed by d143163dd26670621a44a3a0294b1e2e115f981d)