Open nesat opened 4 months ago
same problem on version 450
I've just tried enabling this feature using kerberized HDFS instance, but to no avail.
I'm really surprised it's not configurable in the same manner as hive connector
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.trino.principal=<user>@<domain>
hive.hdfs.trino.keytab=/etc/trino/keytabs/<user>.<endpoint>.keytab
I've just tried enabling this feature using kerberized HDFS instance, but to no avail.
I'm really surprised it's not configurable in the same manner as hive connector
hive.hdfs.authentication.type=KERBEROS hive.hdfs.trino.principal=<user>@<domain> hive.hdfs.trino.keytab=/etc/trino/keytabs/<user>.<endpoint>.keytab
We had tried the settings you mentioned, but they didn't work. We plan to try with MinIO.
We wanted to enable fault-tolerant execution and use HDFS exchange manager as one of the recommended spooling storage types. Our HDFS cluster is kerberized, and Hive and Delta connectors uses HDFS already. However, with HDFS Exchange Manager, we receive
java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
error. Did anybody face a similar issue? Does anybody use HDFS Exchange Manager with Kerberos authentication?Configuration Details
Here is how we set the HDFS Exchange Manager and enable task retry:
/etc/trino/config.properties
/etc/trino/exchange-manager.properties
Here are our resource files:
/etc/hadoop/conf/core-site.xml
/etc/hadoop/conf/hdfs-site.xml
Additional details:
/etc/krb5.conf
krb5cc_1000
Valid starting Expires Service principal 05/16/24 07:45:04 05/17/24 07:45:04 krbtgt/DEV-A.MYSERVER.COM@DEV-A.MYSERVER.COM renew until 05/23/24 07:45:04
java.io.UncheckedIOException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at io.trino.plugin.exchange.filesystem.FileSystemExchange.instantiateSink(FileSystemExchange.java:166) at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$StageExecution.getExchangeSinkInstanceHandle(EventDrivenFaultTolerantQueryScheduler.java:2129) at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.processNodeAcquisitions(EventDrivenFaultTolerantQueryScheduler.java:1606) at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.schedule(EventDrivenFaultTolerantQueryScheduler.java:1012) at io.trino.execution.scheduler.faulttolerant.EventDrivenFaultTolerantQueryScheduler$Scheduler.run(EventDrivenFaultTolerantQueryScheduler.java:832) at io.trino.$gen.Trino_442____20240516_113953_2.run(Unknown Source) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2509) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2483) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1485) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1482) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1499) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1474) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2464) at io.trino.plugin.exchange.hdfs.HadoopFileSystemExchangeStorage.createDirectories(HadoopFileSystemExchangeStorage.java:76) at io.trino.plugin.exchange.filesystem.FileSystemExchange.instantiateSink(FileSystemExchange.java:163) ... 12 more Caused by: org.apache.hadoop.ipc.RemoteException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584) at org.apache.hadoop.ipc.Client.call(Client.java:1530) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at jdk.proxy6/jdk.proxy6.$Proxy280.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:675) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at jdk.proxy6/jdk.proxy6.$Proxy281.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2507) ... 21 more
... 2024-05-16T08:20:14.492Z INFO main Bootstrap retry-policy NONE TASK ... 2024-05-16T08:20:36.806Z INFO main io.trino.security.GroupProviderManager -- Loaded group provider file -- 2024-05-16T08:20:36.807Z INFO main io.trino.exchange.ExchangeManagerRegistry -- Loading exchange manager hdfs -- 2024-05-16T08:20:37.094Z INFO main org.hibernate.validator.internal.util.Version HV000001: Hibernate Validator 8.0.1.Final 2024-05-16T08:20:37.918Z INFO main Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION 2024-05-16T08:20:37.918Z INFO main Bootstrap jmx.base-name ---- ---- 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.base-directories [] [hdfs://dev-a/tmp/trino/] List of base directories separated by commas 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.file-listing-parallelism 50 50 Max parallelism of file listing calls when enumerating spooling files. The actual parallelism will depend on implementation 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.sink-buffer-pool-min-size 10 10 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.sink-buffers-per-partition 2 2 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.sink-max-file-size 1GB 1GB Max size of files written by exchange sinks 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.source-concurrent-readers 4 4 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.source-handle-target-data-size 256MB 256MB Target size of the data referenced by a single source handle 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.source-max-files-per-reader 25 25 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.max-output-partition-count 50 50 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.max-page-storage-size 16MB 16MB Max storage size of a page written to a sink, including the page itself and its size represented as an int 2024-05-16T08:20:37.918Z INFO main Bootstrap exchange.hdfs.block-size 4MB 4MB Block size for HDFS storage 2024-05-16T08:20:37.918Z INFO main Bootstrap hdfs.config.resources [] [/etc/hadoop/conf/core-site.xml, /etc/hadoop/conf/hdfs-site.xml] 2024-05-16T08:20:39.210Z INFO main io.trino.exchange.ExchangeManagerRegistry -- Loaded exchange manager hdfs -- 2024-05-16T08:20:39.304Z INFO main io.trino.server.Server Server startup completed in 31.60s 2024-05-16T08:20:39.304Z INFO main io.trino.server.Server ======== SERVER STARTED ========
Making sure core-site.xml is used
We made some changes to verify
core-site.xml
was accessible and usedMethod 1) Editing
exchange.base-directories=hdfs://dev-a/tmp/trino
inexchange-manager.properties
asexchange.base-directories=hdfs://namenode1.myserver.com:8020/tmp/trino
:Method 2) Removing
hdfs-site.xml
and replacingdev-a
withnamenode1.myserver.com:50070
in bothexchange.base-directories
andfs.defaultFS
:Verifying HDFS kerberos authentication with other methods
Method 1) Curl webhdfs HTTP endpoint which uses kerberos ticket:
klist after it:
Method 2) hdfs in another pod with similar configurations but a different image: