nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

Error while processing statement: Failed to read external resource hdfs://xxx:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar using rhive.connect #96

Closed phoenixhadoop closed 8 years ago

phoenixhadoop commented 8 years ago

my step: 1) > library(RHive) loading required package:rJava loading required package: Rserver 2) > rhive.connect("192.1.1.01",10000,hiveServer2=TRUE,user='tom',password='123') Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Failed to read external resource hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar

Actually, hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar is already exist.

such my env as below : Hadoop version: 2.5.0-chd5.3.2 Hive version: 2.1.0-SNAPSHOT RHive version: 2.0-0.10

192.1.1.01 remote hiveServer2 host 192.0.0.20 local host the two servers could communicate each other via ping command.

in hadoop core-site.xml, no matter set "fs.default.name" or "fs.defaultFS", the error didn't disappear.

then, i debug the RHive/inst/javasrc/src/com/nexr/rhive/hive/HiveJdbcClient.java through java.jar public class HiveJdbcClient implements HiveOperations { ........... ........... ........... public static void main(String args[]){ HiveJdbcClient jc = new HiveJdbcClient (true); jc.connect("192.1.1.01",10000,hiveServer2=TRUE,user='tom',password='123'); jc.addJar("hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar "); } protected boolean execute(String query, boolean reconnect) throws SQLException { Connection connection = getConnection(reconnect); Statement statement = null; try { statement = connection.createStatement(); System.out.println("======> 1: query = " + query); return statement.execute(query); } catch (SQLException e) { System.out.println("======> 2: enter catch exception..."); if (!reconnect) { System.out.println("======> 3: enter if (!reconnect)..."); if (isThriftTransportException(e)) { System.out.println("======> 4: enter if (isThriftTransportException(e))..."); return reexecute(query); } } throw e; } } ................... .................. }

according to the output ======> 1: query = hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar ======> 2: enter catch exception... ======> 3: enter if (!reconnect)... Exception in thread "main" java.sql.SQLException:Error while processing statement: Failed to read external resource hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar

i don't know why "statement.execute(query)" throw exception Meanwhile, it works fine when i execute "add jar hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar " in hive CLI of 192.1.1.01 (remote hiveServer2 host).

Any help ? Thanks a lot ~

Worvast commented 8 years ago

You use branch 'ranger' for this test?

phoenixhadoop commented 8 years ago

i use the source of this branch for test https://github.com/nexr/RHive

ghost commented 8 years ago

@phoenixhadoop ,

Meanwhile, it works fine when i execute "add jar hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar " in hive CLI of 192.1.1.01 (remote hiveServer2 host).

This means you use hive command or beeline to remote hiveserve2 ? If you emulate the env, you should use beeline on the localhost, connect to remote hiveserver2, and do add jar hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar.

phoenixhadoop commented 8 years ago

@DrakeMin beeline>!connect jdbc:hive2://192.1.1.01:10000 tom 123 org.apache.hive.jdbc.HiveDriver add jar hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar connecting to jdbc:hive2://192.1.1.01:10000 connected to : Apache Hive(version 2.1.0-SNAPSHOT) Driver: Hive JDBC(version 2.1.0-SNAPSHOT) Transaction isolation: TRANSACTION_REPEATABLE_READ 0:jdbc:hive2://192.1.1.01:10000> Error: Error while processing statement: Failed to read external resource hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar(state=,code=1)

it throws the similar exception as the original...

ghost commented 8 years ago

@phoenixhadoop How about hive-server2.log in the remote hiveserver2 ? or Hadoop Namenode Log ? Maybe there are some detailed exceptions or infos.

phoenixhadoop commented 8 years ago

@DrakeMin i got the hive-server2.log , pasted part of log as below

2016-01-26T10:59:38,391 WARN [4916fa36-cb18-41ec-925b-f104b1119048HiveServer2-Handler-Pool: Thread-54751]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1506)) - No groups available for user tom 2016-01-26T10:59:38,391 WARN [4916fa36-cb18-41ec-925b-f104b1119048HiveServer2-Handler-Pool: Thread-54751]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1506)) - No groups available for user tom 2016-01-26T10:59:38,398 ERROR [4916fa36-cb18-41ec-925b-f104b1119048HiveServer2-Handler-Pool: Thread-54751]: SessionState (SessionState.java:printError(1010)) - Failed to read external resource hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0- 0.10/rhive_udf.jar java.lang.RuntimeException: Failed to read external resource hdfs://192.0.0.20:9000/rhive/rhive/lib/2.0-0.10/rhive_udf.jar at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1333) at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1289) at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1213) at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1199) at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:74) at org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:115) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:309) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:455) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:431) at sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy34.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:259) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: hive is not allowed to impersonate tom at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy27.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy28.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1970) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1939) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1915) at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1319) ... 30 more

cause the editor could not display all the configuration, uploaded as attachment hadoop configuration
hadoop_configuration.txt

From now on, it seems this issue related to the privilege of hive.

phoenixhadoop commented 8 years ago

this issue has been fixed. root cause is incorrectly configured in core-site.xml of localhost(192.0.0.20) add these properties in core-site.xml of localhost, which are the same as remote hiveServer2 host( 192.1.1.01).

hadoop.proxyuser.hive.hosts.txt

many thanks to @DrakeMin.

ghost commented 8 years ago

@phoenixhadoop Sounds Good! Thanks for using RHive!