klbostee / dumbo

Python module that allows one to easily write and run Hadoop programs.
http://projects.dumbotics.com/dumbo
1.04k stars 146 forks source link

Dumbo doesn't work with CDH4b2 nightlies #53

Closed aripollak closed 12 years ago

aripollak commented 12 years ago

Starting with CDH4b2 (nightlies at http://nightly.cloudera.com/cdh4/), /usr/lib/hadoop has been split out into a bunch of different directories, like hadoop-hdfs and hadoop-mapreduce, so a lot of the assumptions made in dumbo no longer work. A few examples:

I might be able to fix this if I find some free time, but it might require some structural changes.

klbostee commented 12 years ago

Right, sounds annoying.

Could we fix this by extending the -hadoop option to take a list of (comma separated) directories maybe? You can define aliases in /etc/dumbo.conf, so we could fairly easily avoid having to give it that list all the time...

aripollak commented 12 years ago

That could be an option, at least for the immediate problem of looking for JARs in the right place. Or it could try automatically looking in -mapreduce instead of just hadoop_dir?

klbostee commented 12 years ago

Guess that could work too yeah. It might even be worth implementing both maybe, so that people with other distributions have enough flexibility to get things to work as well...

mikeinertia commented 12 years ago

Is there a workaround available for this ? I have a fresh installation of latest Dumbo and Hadoop Cloudera 2.0.0 CDH4 and am still seeing this error. Also, tried linking all jars in the various folder into one folder and use that, no luck.

Any hints would be very helpful

klbostee commented 12 years ago

See comments on the pull request...