Closed paf91 closed 8 months ago
We had a similar problem trying to connect with HTTPS to hive with kerberos authentication. After many attempts we fell back on using SSL with a custom patch (described #10108).
As a suggestion I would like you to consider using impayla instead of pyhive that should cover both ssl and http/s, with or without kerberos authentication.
Needless to say that we are very interested in a definitive and stable solution to these problems. Keep up the good work!
@gmedici we added https://github.com/cloudera/impyla as a possible scheme, if you could try this out. Thanks
hi,
how can I test the impyla driver instead of pyhive? Is it the 'impala' scheme ?
When I try 'hive+http', it looks like it uses still pyhive as default. File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/pyhive/hive.py", line 104, in connect return Connection(*args, **kwargs) File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/pyhive/hive.py", line 249, in init response = self._client.OpenSession(open_session_req) File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/TCLIService/TCLIService.py", line 187, in OpenSession return self.recv_OpenSession() File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/TCLIService/TCLIService.py", line 199, in recv_OpenSession (fname, mtype, rseqid) = iprot.readMessageBegin() File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/thrift/protocol/TBinaryProtocol.py", line 148, in readMessageBegin name = self.trans.readAll(sz) File "/home/at/user/.conda/envs/omcli/lib/python3.10/site-packages/thrift/transport/TTransport.py", line 68, in readAll raise EOFError()
Affected module Does affect Ingestion Framework
Describe the bug When hive.server2.transport.mode is set to http instead of binary, connection to hive fails with Thrift error since it expects binary. Setting thansport mode to http is necessary to work with Apache Knox proxy. More info: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/securing-hive/topics/hive_secure_knox.html To Reproduce
Screenshots or steps to reproduce Connect to airflow scheduler pod using
kubectl exec -it <pod> -- bash
, then runpython
.from sqlalchemy import *
from sqlalchemy.engine import create_engine
engine = create_engine('hive+https://server:8443/;ssl=true;transportMode=http;httpPath=gateway/cdp-proxy-api/hive',connect_args={'ssl_cert': 'none', 'check_hostname': false})
orengine = create_engine('hive+https://server:8443/;ssl=true;transportMode=http;httpPath=gateway/cdp-proxy-api/hive',connect_args={'ssl_cert': '<cert>', 'check_hostname': false})
engine.connect()
Expected behavior Exptected running engine.conect() without error.
Version:
Additional context Add any other context about the problem here.