prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.92k stars 5.33k forks source link

Support kerberos authentication for Hudi tables gss hoodie_partition_metadata #17703

Open YongjinZhou opened 2 years ago

YongjinZhou commented 2 years ago

user_hudi/.hoodie_partition_metadata, java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed

2022-04-28T15:48:49.022+0800 DEBUG dispatcher-query-8 com.facebook.presto.execution.QueryStateMachine Query 20220428_074848_00007_6p5p9 is FAILED 2022-04-28T15:48:49.022+0800 DEBUG query-execution-8 com.facebook.presto.execution.QueryStateMachine Query 20220428_074848_00007_6p5p9 failed com.facebook.presto.spi.PrestoException: Error checking path :hdfs://cluster1/apps/spark/warehouse/xxx.db/user_hudi/.hoodie_partition_metadata, under folder: hdfs://cluster1/apps/spark/warehouse/xxx.db/user_hudi at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:128) at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47) at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20) at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35) at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieException: Error checking path :hdfs://cluster1/apps/spark/warehouse/xxx.db/user_hudi/.hoodie_partition_metadata, under folder: hdfs://cluster1/apps/spark/warehouse/xxx.db/user_hudi at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:221) at com.facebook.presto.hive.util.HiveFileIterator.lambda$getLocatedFileStatusRemoteIterator$0(HiveFileIterator.java:103) at com.google.common.collect.Iterators$5.computeNext(Iterators.java:639) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136) at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:69) at com.facebook.presto.hive.util.HiveFileIterator.computeNext(HiveFileIterator.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136) at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1811) at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161) at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300) at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:195) at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:40) at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:121) ... 7 more Caused by: org.apache.hudi.exception.HoodieException: Error checking Hoodie partition metadata for hdfs://cluster1/apps/spark/warehouse/xxx.db/user_hudi at org.apache.hudi.common.model.HoodiePartitionMetadata.hasPartitionMetadata(HoodiePartitionMetadata.java:143) at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:161) ... 24 more

rohanpednekar commented 2 years ago

@pratyakshsharma, do you think you can take a look? @YongjinZhou what version of Presto and Hudi are you using?

YongjinZhou commented 2 years ago

presto 0.265.1 and hudi 0.9

pratyakshsharma commented 2 years ago

Do not have any experience with kerberos. @codope can you help here?

imjalpreet commented 2 years ago

@YongjinZhou Just a suggestion, can you also try version(0.272) of presto and let us know if you still face the same issue. Just to be clear, please try 0.272 and not 0.272.1.

houlingchen commented 2 years ago

@YongjinZhou Just a suggestion, can you also try version(0.272) of presto and let us know if you still face the same issue. Just to be clear, please try 0.272 and not 0.272.1.

We try the version 0.272, the same issue encountered. In a kerberos cluster, start presto with no ticket cache (kdestroy or cache expires), use the presto-cli-0.272-executable.jar as a client, query hudi tables will met this question. Seems in this situation, won't use the kerberos information in the hive.properties. Querying hive tables is ok.

houlingchen commented 2 years ago

As they mentioned in https://github.com/trinodb/trino/pull/5478/files , we alter the BackgroundHiveSplitLoader.java Line 125 in this way: future = hdfsEnvironment.doAs(hdfsContext.getIdentity().getUser(), () -> loadSplits());. Then replace _$PRESTO_HOME/plugin/hive-hadoop2/presto-hive-{$prestoversion}.jar to the repackage one, this issue seems to be fixed.