nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

How to pass an object to RUDF in hive #73

Open rajasekhariitbbs opened 9 years ago

rajasekhariitbbs commented 9 years ago

Working Function

Minimum = function(column1,column2){ min(column1,column2) }

Not Working

a=-10000 Minimum = function(column1,column2){ min(column1,column2,a) }

How to pass an object/dataframe/function into the RHive-UDF

Thanks Raja Sekhar

ssshow16 commented 9 years ago

If you just pass value into the RHive-UDF, you can do like the following:

Minimum = function(column1,column2, a){ min(column1,column2,a) }

rhive.query("select R('Minimum',col1,col2,-10000,0.0) from table_name")

However, RHive just export the UDF function and RHive-UDF is executed at each DataNode, so this cannot reference your function/object/dataframe. If you need to use other function, you have to define a inner function like the following.

Minimum = function(column1,column2, a){ min(column1,column2,a) sub_func <- function(a,b){ .... } sub_func(column1, column2) }

Thanks.

On Tue, Nov 4, 2014 at 3:10 PM, rajasekhariitbbs notifications@github.com wrote:

Working Function

Minimum = function(column1,column2){ min(column1,column2) } Not Working

a=-10000 Minimum = function(column1,column2){ min(column1,column2,a) }

How to pass an object/dataframe/function into the RHive-UDF

Thanks Raja Sekhar

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/73.

rajasekhariitbbs commented 9 years ago

These are the following rhive.evn() rhive.env() hadoop home: /home/training/hadoop-2.4.0/ hadoop conf: /home/training/hadoop-2.4.0/etc/hadoop fs: hdfs://localhost:9000 hive home: /home/training/hive/ user name: training user home: /home/training temp dir: /tmp/training

select queries are working good

######### User Define Functions ######## coefficient <- 1.1 scoring <- function(sal) { coefficient * sal } rhive.assign('coefficient',coefficient) rhive.assign('scoring',scoring) rhive.export('scoring') rhive.export('coefficient')

The above rhive.export is saving the files in filesystem /rhive/udf/training, Is that correct?

ssshow16 commented 9 years ago

You cannot assign and export a variable 'coefficient'.

Exported file will be saved in HDFS : /rhive/udf/{user}.

On Tue, Nov 4, 2014 at 5:51 PM, rajasekhariitbbs notifications@github.com wrote:

These are the following rhive.evn() rhive.env() hadoop home: /home/training/hadoop-2.4.0/ hadoop conf: /home/training/hadoop-2.4.0/etc/hadoop fs: hdfs://localhost:9000 hive home: /home/training/hive/ user name: training user home: /home/training temp dir: /tmp/training

select queries are working good

######### User Define Functions ######## coefficient <- 1.1 scoring <- function(sal) { coefficient * sal } rhive.assign('coefficient',coefficient) rhive.assign('scoring',scoring) rhive.export('scoring') rhive.export('coefficient')

The above rhive.export is saving the files in filesystem /rhive/udf/training, Is that correct?

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/73#issuecomment-61609636.

rajasekhariitbbs commented 9 years ago

In my case the file is being saved in local file system, I'm confused and don't know why it is happening, For now if I manually copy the .RDA file from local system to HDFS then the code is working.

In the below URL they are exporting the coefficient also https://github.com/nexr/RHive/wiki/RHive-example-code

ssshow16 commented 9 years ago

If RHive-UDF reference R Object( *.RData) as first param of R() in Query.

rhive.query("select R('scoring',col_sal,0.0) from emp")

In your case, two R Object is saved in HDFS because you use rhive.export() function for each R Object.

So, scoring function in scoring.RData cannot reference coefficient value in coefficient.RData. In this case, you have to call rhive.exportAll(‘scoring’). rhive.exportAll() function save all R Object into scoring.RData.

Please, try again.

On Tue, Nov 4, 2014 at 6:03 PM, rajasekhariitbbs notifications@github.com wrote:

In my case the file is being saved in local file system, I'm confused and don't know why it is happening, For now if I manually copy the .RDA file from local system to HDFS then the code is working.

In the below URL they are exporting the coefficient also https://github.com/nexr/RHive/wiki/RHive-example-code

— Reply to this email directly or view it on GitHub https://github.com/nexr/RHive/issues/73#issuecomment-61610801.