nexr / RHive

RHive is an R extension facilitating distributed computing via Apache Hive.
http://nexr.github.io/RHive
122 stars 63 forks source link

RHive kerberos issue on CDH #94

Open ghost opened 8 years ago

ghost commented 8 years ago

I am using the following version of Hive 1.1.0-cdh5.4.3 with R 3.2.2. My typical jdbc string would look something like below if I were connecting via SQL Squirrel for instance:

jdbc:hive2://hive.server.com:10000/default;AuthMech=1;principal=hive/_HOST@SOME.DOMAIN.COM

See the following error after the connect, note I have a valid credential prior to invoking R repl:

rhive.connect(host = "hive.server.com", port = "10001", db = "default", user = "bayroot", password = "XXXXX", defaultFS="hdfs://nameservice1/rhive", properties="hive.principal=hive/_HOST@SOME.DOMAIN.COM") Warning: +----------------------------------------------------------+

  • / hiveServer2 argument has not been provided correctly. +
  • / RHive will use a default value: hiveServer2=TRUE. + +----------------------------------------------------------+ 15/10/21 11:17:03 INFO jdbc.Utils: Supplied authorities: hive.server.com:10001 15/10/21 11:17:03 INFO jdbc.Utils: Resolved authority: hive.server.com:10001 15/10/21 11:17:03 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://hive.server.com:10001/default;principal=hive/_HOST@SOME.DOMAIN.COM 15/10/21 11:20:05 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)]

Please note this code was working fine in CDH 5.3.2 which was hive .13 I believe:

Package: RHive Type: Package Title: R and Hive Version: 2.0-1.0

Worvast commented 8 years ago

Hi @bayroot22, i think in two options:

1.- See if you has a Kerberos ticket created with 'klist' (In Shell, not in R), if not, create one with 'kinit' command.

2.- See if HiveServer2 has the correct configuration with the RHive UDF File included, i only has this info for do that:

In hive-site.xml add this property and restart HiveServer2:

<property>
    <name>hive.aux.jars.path</name>
    <value> ----/path/to/rhive_udf.jar----, ----/other/aux/jars.jar----  </value>
</property>

If you use Cloudera CDH they has one tutorial for include UDF files to be use by HiveServer

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cm_mc_hive_udf.html

Good luck

ghost commented 8 years ago

thanks for the response...

1) yes a ticket was created 2) I added the rhive_udf.jar to my aux path and restarted hive but same issue

ghost commented 8 years ago

I added the following property hive.keytab=/home/bayroot/hive.keytab to the connection string now I get the following:

Exception in thread "Thread-6" java.lang.RuntimeException: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://hive.server.com:10001/default: Peer indicated failure: Unsupported mechanism type PLAIN

A couple thoughts/questions?

1 - I manage a multi-tenant environment so exposing the hive keytab is a serious security concern so even if this worked it wouldn't be a solution I could implement... 2 - In the loginUserFromKeytab method how does the authentication type get set? I don't see the setAuthenticationMethod method, possibly it gets set in loginUserFromKeytab? As indicated in the above message it looks like RHive is sending auth type of PLAIN and not KERBEROS.

Worvast commented 8 years ago

As I have understood if you created the ticket in R you not should put the flags username and password, there may come the message of 'plaintext password' when trying to use these parameters to log, If you have a Kerberos ticket created in the R environment R is responsible for trying to use it automatically, creating the connection / set necessary, soo should use only:

rhive.connect(host = "hive.server.com", port = "10001", db = "default",
defaultFS="hdfs://nameservice1/rhive", 
properties="hive.principal=hive/_HOST@SOME.DOMAIN.COM")

It is also true that with this format i can't connect, I used the following format for the connection (Example):

rhive.connect(host="hive.server.com:10000/DEFAULTDB;principal=hive/KERBEROSPRINCIPAL;AuthMech=1;KrbHostFQDN=KERBEROSHOSTURL;KrbServiceName=hive;KrbRealm=KERBEROSREALM",
defaultFS="hdfs://nameservice1/rhive", 
hiveServer2=TRUE,
updateJar=FALSE)
arundoss commented 8 years ago

Hi Team,

could you please confirm Rhive supports kerberos or not.,if its support pls tell the me the correct format for rhive.connect.

Prussia commented 5 years ago

any ideas?