nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

UDF is trying to load its RData file from wrong directory #79

Open alexvorobiev opened 9 years ago

alexvorobiev commented 9 years ago

I am trying to run examples from the RHive manual (I have just installed the latest version from github). The connection works, the UDF function gets installed into correct location (/rhive/udf/myuserid/sumCrimes.RData) but rhive.query fails:

rhive.query("SELECT urbanpop, R('sumCrimes', murder, assault, rape, 0.0) FROM usarrests")
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 

The job's log file shows that it tries to load the file not from my user's subdirectory but from /rhive/udf:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: Requested file /rhive/udf/sumCrimes.RData does not exist.

When I copied the sumCrimes.RData manually to that directory, everything worked.

Here is what rhive.env() returns:

> rhive.env()
hadoop home: /opt/mapr/hadoop/hadoop-0.20.2
hadoop conf: /opt/mapr/hadoop/hadoop-0.20.2/conf
fs: maprfs:///
hive home: /opt/mapr/hive/hive-0.13
user name: myuserid
user home: /u/myuserid
temp dir: /tmp/myuserid> 

Here is the value of mapred.child.env on the nodes (from the Job Configuration page): RHIVE_UDF_DIR=/rhive/udf/myuserid

ssshow16 commented 9 years ago

I fixed it and release nexr-rhive-2.0.9. Please, try it again.

alexvorobiev commented 9 years ago

Just tried installing the latest from github - still the same problem. Is there anything I can do to help testing?

ssshow16 commented 9 years ago

At first, please remove all file and directory for RHive on HDFS, such as /rhive/tmp, lib, udf and etc.. After that, try it again.