prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.07k stars 5.38k forks source link

What's the correct configuration for the Delta Lake connector when the files are stored in Kerberos-secured HDFS? #21979

Closed orthoxerox closed 4 months ago

orthoxerox commented 9 months ago

Describe the problem you faced

Environment Description

Steps To Reproduce

Steps to reproduce the behavior:

  1. Have an on-prem Hadoop cluster with Kerberos authentication
  2. Create Hive and Delta Lake connectors as shown above
  3. Create a Delta Lake table on the cluster
  4. Try querying the created table

Expected behavior

Presto returns the contents of the table

Additional context

The keytab we use looks like this:

ktutil:  read_kt /etc/security/keytabs/hive.service.keytab
ktutil:  l -e
slot KVNO Principal
---- ---- ---------------------------------------------------------------------
   1    1 hive/presto-coordinator.intranet@INTRANET (aes128-cts-hmac-sha1-96)
   2    1 hive/presto-coordinator.intranet@INTRANET (aes256-cts-hmac-sha1-96)
   3    1 hive/presto-coordinator.intranet@INTRANET (arcfour-hmac)
   4    1 hive/presto-coordinator.intranet@INTRANET (camellia256-cts-cmac)
   5    1 hive/presto-coordinator.intranet@INTRANET (camellia128-cts-cmac)

Stacktrace

2024-02-21T14:51:35.552+0300    INFO    Query-20240221_115132_00000_cy3fc-235   io.delta.standalone.internal.storage.DelegatingLogStore LogStore io.delta.storage.HDFSLogStore is used for scheme hdfs
2024-02-21T14:51:35.593+0300    WARN    Query-20240221_115132_00000_cy3fc-235   org.apache.hadoop.ipc.Client    Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2024-02-21T14:51:35.600+0300    WARN    Query-20240221_115132_00000_cy3fc-235   org.apache.hadoop.ipc.Client    Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2024-02-21T14:51:35.601+0300    INFO    Query-20240221_115132_00000_cy3fc-235   org.apache.hadoop.io.retry.RetryInvocationHandler   Exception while invoking getBlockLocations of class ClientNamenodeProtocolTranslatorPB over namenode2.intranet/10.226.49.204:8020 after 1 fail over attempts. Trying to fail over immediately.
java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "presto-coordinator.intranet/10.<redacted>"; destination host is: "namenode2.intranet":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1413)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy258.getBlockLocations(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy259.getBlockLocations(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226)
    at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
    at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306)
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
    at org.apache.hadoop.fs.HadoopExtendedFileSystem.open(HadoopExtendedFileSystem.java:134)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
    at io.delta.storage.HadoopFileSystemLogStore.read(HadoopFileSystemLogStore.java:46)
    at io.delta.standalone.internal.storage.DelegatingLogStore.read(DelegatingLogStore.scala:83)
    at io.delta.standalone.internal.Checkpoints.loadMetadataFromFile(Checkpoints.scala:136)
    at io.delta.standalone.internal.Checkpoints.lastCheckpoint(Checkpoints.scala:110)
    at io.delta.standalone.internal.Checkpoints.lastCheckpoint$(Checkpoints.scala:109)
    at io.delta.standalone.internal.DeltaLogImpl.lastCheckpoint(DeltaLogImpl.scala:42)
    at io.delta.standalone.internal.SnapshotManagement.getSnapshotAtInit(SnapshotManagement.scala:218)
    at io.delta.standalone.internal.SnapshotManagement.$init$(SnapshotManagement.scala:37)
    at io.delta.standalone.internal.DeltaLogImpl.<init>(DeltaLogImpl.scala:47)
    at io.delta.standalone.internal.DeltaLogImpl$.apply(DeltaLogImpl.scala:263)
    at io.delta.standalone.internal.DeltaLogImpl$.forTable(DeltaLogImpl.scala:245)
    at io.delta.standalone.internal.DeltaLogImpl.forTable(DeltaLogImpl.scala)
    at io.delta.standalone.DeltaLog.forTable(DeltaLog.java:176)
    at com.facebook.presto.delta.DeltaClient.loadDeltaTableLog(DeltaClient.java:151)
    at com.facebook.presto.delta.DeltaClient.getTable(DeltaClient.java:79)
    at com.facebook.presto.delta.DeltaMetadata.getTableHandle(DeltaMetadata.java:220)
    at com.facebook.presto.delta.DeltaMetadata.getTableHandle(DeltaMetadata.java:73)
    at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableHandle(ClassLoaderSafeConnectorMetadata.java:220)
    at com.facebook.presto.metadata.MetadataUtil.getOptionalTableHandle(MetadataUtil.java:180)
    at com.facebook.presto.metadata.MetadataManager$1.getTableHandle(MetadataManager.java:1331)
    at com.facebook.presto.util.MetadataUtils.lambda$getTableColumnMetadata$2(MetadataUtils.java:83)
    at com.facebook.presto.common.RuntimeStats.profileNanos(RuntimeStats.java:136)
    at com.facebook.presto.util.MetadataUtils.getTableColumnMetadata(MetadataUtils.java:81)
    at com.facebook.presto.util.MetadataUtils.getTableColumnsMetadata(MetadataUtils.java:54)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:1282)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:338)
    at com.facebook.presto.sql.tree.Table.accept(Table.java:53)
    at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:352)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:2600)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1615)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:338)
    at com.facebook.presto.sql.tree.QuerySpecification.accept(QuerySpecification.java:138)
    at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:352)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:360)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1116)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:338)
    at com.facebook.presto.sql.tree.Query.accept(Query.java:105)
    at com.facebook.presto.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:352)
    at com.facebook.presto.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:330)
    at com.facebook.presto.sql.analyzer.Analyzer.analyzeSemantic(Analyzer.java:117)
    at com.facebook.presto.sql.analyzer.BuiltInQueryAnalyzer.analyze(BuiltInQueryAnalyzer.java:93)
    at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:203)
    at com.facebook.presto.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:107)
    at com.facebook.presto.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:941)
    at com.facebook.presto.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:167)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1452)
    ... 78 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
    at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
    ... 81 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:162)
    at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
    at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:189)
    at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
    ... 90 more
imjalpreet commented 9 months ago

Hi @orthoxerox, to confirm, both hive and delta catalogs are connecting to the same Hadoop Cluster?

orthoxerox commented 9 months ago

Hi @imjalpreet, yes, it's the same Hadoop cluster.

zhongtiancai commented 4 months ago

I got the same Exception when use hudi connector in 0.286

connector.name=hudi hive.config.resources=/etc/hadoop/conf/hive-site.xml,/etc/hadoop/conf/hdfs-site.xml hive.metastore.uri=thrift://host:9083 hive.metastore.authentication.type=KERBEROS hive.metastore.service.principal=hive/_HOST@A.COM hive.metastore.client.principal=user1 hive.metastore.client.keytab=/etc/security/keytabs/user.keytab hive.hdfs.authentication.type=KERBEROS hive.hdfs.impersonation.enabled=true hive.hdfs.presto.principal=user1 hive.hdfs.presto.keytab=/etc/security/keytabs/user.keytab

agrawalreetika commented 4 months ago

@zhongtiancai Can you check once if your above configuration are working fine when you are connecting to hive connector? And to confirm you have placed the required /etc/krb5.conf on the coordinator host as well?

zhongtiancai commented 4 months ago

@agrawalreetika It can display Hive meta information, such as table information, but cannot access HDFS. and the /etc/krb5.conf is also added to the jvm config.

agrawalreetika commented 4 months ago

@zhongtiancai Could you please reach out to me over slack? needed some more setup information.

zhongtiancai commented 4 months ago

@agrawalreetika The program is deployed in the company and cannot connect to the Internet. I tried change the connector from hudi to hive-hadoop2, and it can connect to HDFS normally. But hudi connector will first warn gss initialized fail, and then report 'Could not check if {path} is a valid table'.
I will try reading the source code and more documents to see if I have configured it incorrectly. Then provide feedback here

zhongtiancai commented 4 months ago

@agrawalreetika I solved the problem by delete the code on the krb5. conf file #includedir /etc/krb5.conf.d/ . Because the default_ccached name is configured in this directory and KCM is used, this configuration may cause errors when uses Hadoop