nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

Only simple query works but not complex queries or agreegated functions #88

Open pratikdhamanekar opened 9 years ago

pratikdhamanekar commented 9 years ago

When I run simple query as below it works rhive.query("select * from tablename where name ='xxx'");

but

When I try aggregated query as below it doesn't work. rhive.query("select count(*) from tablename where name ='xxx'");

I'm getting the following error: Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Anyone has any inputs on this?

Thank you

ghost commented 9 years ago

Hi,

We've had the same issue and it turned out to be a problem with the cluster configuration and access rights for R. I wasn't the one solving this issue, but if I remember correctly it had to do with hdfs permissions.

TinoSM commented 9 years ago

If it helps, for people with this problem in the future, complex queries imply calling MapReduce (Hadoop) while simple ones can be executed by only reading + filtering data "locally" (I guess), so you probably have problems with permissions/login.

On my system, initializing RHive without user/pass allows to view data (maybe we have to check this :) ), but does not allow to enqueue MR jobs (which are used for complex queries).

vidanimegh commented 8 years ago

@TinoSM Thank you so much! your suggestion solved my problem. Cheers! :)

Lxmnkmr commented 8 years ago

I seem to be facing a similar problem where simple queries are being executed but queries involving MapReduce tasks are giving me a slightly different error: "Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask"

I checked the logs further and found the below exception:

org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File /rhive/udf/user_name/hsum.RData does not exist at com.nexr.rhive.hive.udf.RUDAF$GenericRUDAF.loadRObjects(RUDAF.java:525)

However I also checked the hdfs directory /rhive/udf/user_name/ and I found that the hsum.RData file is existing there(hsum is my UDF) and I gave it all permissions, but I still face the same error.

Any inputs regarding this?

Lxmnkmr commented 8 years ago

As per my understanding, the .RData file will be created and accessed from /rhive/udf/user_name directory in Hadoop file system and the error I get shows file not found in that drectory. Just to give it a try, I created the /rhive/udf/user_name directory on local file system of all my datanodes and I copied the .RData file to them manually and then the query worked without any error!!!

Any inputs on why is it looking for the .RData file on local file system of datanodes instead of hadoop file system? Also, this happens only when I use UDAFs, while working on UDFs I don't face any error and the .RData file is created and accessed from hadoop file system only.