Closed aripollak closed 12 years ago
Right, sounds annoying.
Could we fix this by extending the -hadoop option to take a list of (comma separated) directories maybe? You can define aliases in /etc/dumbo.conf, so we could fairly easily avoid having to give it that list all the time...
That could be an option, at least for the immediate problem of looking for JARs in the right place. Or it could try automatically looking in
Guess that could work too yeah. It might even be worth implementing both maybe, so that people with other distributions have enough flexibility to get things to work as well...
Is there a workaround available for this ? I have a fresh installation of latest Dumbo and Hadoop Cloudera 2.0.0 CDH4 and am still seeing this error. Also, tried linking all jars in the various folder into one folder and use that, no luck.
Any hints would be very helpful
See comments on the pull request...
Starting with CDH4b2 (nightlies at http://nightly.cloudera.com/cdh4/), /usr/lib/hadoop has been split out into a bunch of different directories, like hadoop-hdfs and hadoop-mapreduce, so a lot of the assumptions made in dumbo no longer work. A few examples:
JAVA_HOME=/usr/lib/jvm/default-java
andHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec
, I can't even rundumbo ls / -hadoop /usr/lib/hadoop
since it will complain about JAVA_HOME not being set or not being able to find libexec.dumbo start wordcount.py -input foo -output bar -hadoop /usr/lib/hadoop
results in "ERROR: Streaming jar not found" since the streaming jar is now under /usr/lib/hadoop-mapreduce.I might be able to fix this if I find some free time, but it might require some structural changes.