nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

How install RHive UDF in Cloudera Stack #95

Closed Worvast closed 8 years ago

Worvast commented 8 years ago

Hi, we have a problem with RImpala and the UDF jar, i try to explain it:

The problem already shown in another issue, and saw the solution was used: https://github.com/nexr/RHive/issues/90

We receive the 'Inssuficient privileges to execute ADD' when connect, and the posted solution are add to hive-site.xml

<property>
  <name>hive.aux.jars.path</name>
  <value>/path/to/rhive_udf.jar,/other/aux/jars.jar</value>
</property>

But, we have a managed Cloudera cluster, therefore, we receive support for one thing, but not for other: they install the UDF, but they dont touch 'hive-site.xml', they use this procedure for install UDFs jars:

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_mc_hive_udf.html

Following this procedure they have to add the UDF, they ask us one question:

'name to put in the "CREATE FUNCTION", In order to create the function with the JAR need to know the following:

CREATE FUNCTION AS java_function_name 'bla.bla.bla.name_of_function' USING JAR 'hdfs:///user/hive/lib/rhive_udf.jar';

And i dont have idea what i need to create here. One function name?What name? One function for each function in the jar?

I hope you can help me, sorry for the bad use of english ^^U

ghost commented 8 years ago

Hi, @Worvast First of all, make sure you use ranger branch. ranger branch dose not use ADD JAR, so should not meet the privilege problem.

Thanks.

Worvast commented 8 years ago

Hi @DrakeMin, this change fix the error thanks, but now we receive other :P

Required privileges for this query: Server=server1->URI=file:///opt/local/hive/lib/rhive_udf.jar->action=*;

Perhaps to give privileges to users to use that jar worth? It would directly but is an annoying bureaucratic process and I prefer to ask before

Worvast commented 8 years ago

Mmmm, ok @DrakeMin they finally add privileges for the JAR but now we have other error when connect.

Error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED:  SemanticException Error retrieving udf class

I will look for information but I want to put here this too.

ghost commented 8 years ago

@Worvast IMO, more specific messages(usually Caused by ...) are somewhere in the hive-server2.log file. Let me see them if possible.

Worvast commented 8 years ago

Hi @DrakeMin ty very much for the help, i think this is important part of rhive log:

2015-11-05 11:32:49,683 INFO org.apache.sentry.binding.hive.conf.HiveAuthzConf: DefaultFS: hdfs://nameservice1
2015-11-05 11:32:49,696 INFO org.apache.sentry.binding.hive.conf.HiveAuthzConf: DefaultFS: hdfs://nameservice1
2015-11-05 11:32:49,696 WARN org.apache.sentry.binding.hive.conf.HiveAuthzConf: Using the  deprecated config setting hive.sentry.server instead of sentry.hive.server
2015-11-05 11:32:49,696 WARN org.apache.sentry.binding.hive.conf.HiveAuthzConf: Using the deprecated config setting hive.sentry.provider instead of sentry.provider
2015-11-05 11:32:49,704 ERROR org.apache.hadoop.hive.ql.Driver: FAILED: SemanticException    Error retrieving udf class
org.apache.hadoop.hive.ql.parse.SemanticException: Error retrieving udf class
at org.apache.sentry.binding.hive.HiveAuthzBindingHook.preAnalyze(HiveAuthzBindingHook.java:232)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1111)
    at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1105)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:171)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:398)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:385)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:490)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.nexr.rhive.hive.udf.RUDF
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:190)
    at org.apache.sentry.binding.hive.HiveAuthzBindingHook.preAnalyze(HiveAuthzBindingHook.java:221)
... 20 more

2015-11-05 11:32:49,704 INFO org.apache.hadoop.hive.ql.log.PerfLogger: </PERFLOG method=compile start=1446719569669 end=1446719569704 duration=35 from=org.apache.hadoop.hive.ql.Driver>
2015-11-05 11:32:49,704 INFO org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-11-05 11:32:49,704 INFO org.apache.hadoop.hive.ql.log.PerfLogger: </PERFLOG method=releaseLocks start=1446719569704 end=1446719569704 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-11-05 11:32:49,704 INFO org.apache.hadoop.hive.ql.log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-11-05 11:32:49,704 INFO org.apache.hadoop.hive.ql.log.PerfLogger: </PERFLOG method=releaseLocks start=1446719569704 end=1446719569704 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-11-05 11:32:49,704 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error executing statement: 
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED:     SemanticException Error retrieving udf class
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:102)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:171)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:398)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:385)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:490)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error retrieving udf class
    at org.apache.sentry.binding.hive.HiveAuthzBindingHook.preAnalyze(HiveAuthzBindingHook.java:232)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1111)
    at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1105)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
... 15 more
Caused by: java.lang.ClassNotFoundException: com.nexr.rhive.hive.udf.RUDF
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:190)
    at org.apache.sentry.binding.hive.HiveAuthzBindingHook.preAnalyze(HiveAuthzBindingHook.java:221)
    ... 20 more
ghost commented 8 years ago

Hi, @Worvast. I think you should rewrite below lines in the RHive/R/rhive.R :

.registerUDFs <- function(hiveClient) {
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "R", "com.nexr.rhive.hive.udf.RUDF"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "RA", "com.nexr.rhive.hive.udf.RUDAF"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "unfold", "com.nexr.rhive.hive.udf.GenericUDTFUnFold"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "expand", "com.nexr.rhive.hive.udf.GenericUDTFExpand"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "rkey", "com.nexr.rhive.hive.udf.RangeKeyUDF"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "scale", "com.nexr.rhive.hive.udf.ScaleUDF"))
  hiveClient$execute(sprintf("CREATE TEMPORARY FUNCTION %s AS \"%s\"", "array2String", "com.nexr.rhive.hive.udf.GenericUDFArrayToString"))
}

Thanks.

Worvast commented 8 years ago

Done. Not permission to create temporary functions en HIVE. Granted and solved.