nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

rhive failing on udf #76

Open ghost opened 10 years ago

ghost commented 10 years ago

I am running the following code against a table with one column (elements in column are 4 alpha characters long):

myFilter <- function(x){ return(x) }

rhive.execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "R", "com.nexr.rhive.hive.udf.RUDF"))

rhive.query("select R('myfilter',term,0.0) from rhiveterm_bsp")

here is the exception I am seeing in map output:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"term":"fsrfx"} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at

apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"term":"fsrfx"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.exec.UDFArgumentException: org.rosuda.REngine.Rserve.RserveException: Cannot connect: Connection refused at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:88) at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:60) at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:51) at com.nexr.rhive.hive.udf.RUDF.getConnection(RUDF.java:261) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:243) at com.nexr.rhive.hive.udf.RUDF.evaluate(RUDF.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:246) at com.nexr.rhive.hive.udf.RUDF.evaluate(RUDF.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) ... 9 more Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: org.rosuda.REngine.Rserve.RserveException: Cannot connect: Connection refused at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:88) at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:60) at org.rosuda.REngine.Rserve.RConnection.(RConnection.java:51) at com.nexr.rhive.hive.udf.RUDF.getConnection(RUDF.java:261) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:243) at com.nexr.rhive.hive.udf.RUDF.evaluate(RUDF.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) at com.nexr.rhive.hive.udf.RUDF.getConnection(RUDF.java:265) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:243) ... 20 more

ssshow16 commented 10 years ago

Please, check if Rserve is running at each DataNode.

ghost commented 10 years ago

thanks - I ran 'R CMD RServe' on each of the data nodes now I get a slightly different error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"term":"fsrfx"} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"term":"fsrfx"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.rosuda.REngine.Rserve.RserveException: eval failed, request status: error code: 127 at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:246) at com.nexr.rhive.hive.udf.RUDF.evaluate(RUDF.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) ... 9 more Caused by: org.rosuda.REngine.Rserve.RserveException: eval failed, request status: error code: 127 at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:243) ... 20 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

ssshow16 commented 10 years ago

Did you assign and export your function like the following:

myFilter <- function(x){ return(x) }

rhive.assign("myFilter",myFilter) rhive.export("myFiler") rhive.query("select R('myFiler',term,0.0) from rhiveterm_bsp")

And you don't need to call like the following because RHive make a functoin when rhive.connect(). rhive.execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "R", "com.nexr.rhive.hive.udf.RUDF"))

Thanks.

ghost commented 10 years ago

Thanks - I gave it a try:

myFilter <- function(x){

  • return(x)
  • }

rhive.assign("myFilter",myFilter) [1] TRUE rhive.export("myFilter") No encryption was performed by peer. No encryption was performed by peer. [1] TRUE rhive.query("select R('myFilter',term,0.0) from rhiveterm_bps") Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

mapper logs still getting error code: 127

2014-11-07 09:12:35,447 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"term":"fsrfx"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.rosuda.REngine.Rserve.RserveException: eval failed, request status: error code: 127 at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:247) at com.nexr.rhive.hive.udf.RUDF.evaluate(RUDF.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) ... 9 more Caused by: org.rosuda.REngine.Rserve.RserveException: eval failed, request status: error code: 127 at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:234) at com.nexr.rhive.hive.udf.RUDF.loadRObjects(RUDF.java:243) ... 20 more

ssshow16 commented 10 years ago

Please check if there is /rhive/udf/{user}/myFilter.RData in HDFS.

ghost commented 10 years ago

yes I confirmed that piece early on when I was getting a different exception.

ssshow16 commented 10 years ago

RHive try to download myFilter.RData into tmp dir(ex. /tmp/{user}) at each datanode from HDFS.

Please check if there is myFilte.RData in tmp dir and tmp dir have permission to write files.