Open findepi opened 2 years ago
The throwing code is a safety check to prevent infinite loop, sending same request over and over.
We assumed batchGetPartitionAsync
will always process at least one partition.
@pettyjamesm was the assumption wrong, or is it a Glue service's fault?
I'm not sure about that actually- I know that the behavior unprocessed keys list can be non-empty based on undocumented characteristics of the response but my understanding is that the mechanism is to prevent overly large response payloads. Since this is a CI run, I doubt that the specific partition in question was overly large to the point where a single partition wouldn't be returned- so I wonder whether this is actually a case of "partition does not exist" and the Glue API choosing to return requested partitions that do not exist as "unprocessed" instead of throwing an EntityNotFound
exception like you might expect for a single get partition request.
so I wonder whether this is actually a case of "partition does not exist"
The CI failed at TestHiveGlueMetastore>AbstractTestHive.testPartitionStatisticsSampling:3417
.
The table has two partitions, and I don't think the behavior can vary from run to run.
I think we should assume the partition existed, unless there is some other bug and table wasn't created correctly.
@pettyjamesm can this be rate-limiting related?
it seems we're seeing this internally on CI (so more than just one occasion).
@pettyjamesm is it a Glue bug? when will it be fixed?
I have some more context now about how this might happen after talking with someone on the Glue team. That said, it would be great if we could nail down how exactly this was being triggered before trying to put together a fix to address this. Can you provide, for any occurrence of this issue, the following data points so that we can see what additional details we might be able to get before I attempt a code change?
Shared the info offline.
cc @ppalucha
Also encountered by Rodrigo as discussed here https://trinodb.slack.com/archives/CGB0QHWSW/p1662650934614339
io.trino.spi.TrinoException: Cannot make progress retrieving partitions. Unable to retrieve partitions: [{Values: [2019-02-21, 19]}.......
Reported again at https://trinodb.slack.com/archives/CGB0QHWSW/p1686219201862739
On random days we are getting HIVE_METASTORE_ERROR error on different delta lake tables. Same query runs fine in next run. And tables underlying partitions are also not updating while this query is running.
io.trino.spi.TrinoException: Cannot make progress retrieving partitions. Unable to retrieve partitions: [{Values: [2023, 05, 07, 16]}, {Values: [2023, 05, 09, 05]}, {Values: [2023, 05, 10, 10]}, {Values: [2023, 05, 07, 07]}, {Values: [2023, 05, 10, 09]}, {Values: [2023, 05, 07, 13]}, {Values: [2023, 05, 09, 18]}, {Values: [2023, 05, 07, 17]}, {Values: [2023, 05, 07, 10]}, {Values: [2023, 05, 09, 22]}, {Values: [2023, 05, 08, 13]}, {Values: [2023, 05, 07, 15]}, {Values: [2023, 05, 08, 11]}, {Values: [2023, 05, 09, 15]}, {Values: [2023, 05, 10, 23]}, {Values: [2023, 05, 09, 12]}, {Values: [2023, 05, 10, 05]}, {Values: [2023, 05, 08, 15]}, {Values: [2023, 05, 07, 12]}, {Values: [2023, 05, 10, 04]}, {Values: [2023, 05, 09, 17]}, {Values: [2023, 05, 08, 10]}, {Values: [2023, 05, 10, 03]}, {Values: [2023, 05, 10, 21]}, {Values: [2023, 05, 08, 04]}, {Values: [2023, 05, 07, 06]}, {Values: [2023, 05, 08, 05]}, {Values: [2023, 05, 09, 01]}, {Values: [2023, 05, 09, 11]}, {Values: [2023, 05, 11, 06]}, {Values: [2023, 05, 09, 03]}, {Values: [2023, 05, 11, 09]}, {Values: [2023, 05, 07, 09]}, {Values: [2023, 05, 10, 11]}, {Values: [2023, 05, 08, 08]}, {Values: [2023, 05, 10, 13]}, {Values: [2023, 05, 08, 18]}, {Values: [2023, 05, 09, 16]}, {Values: [2023, 05, 10, 16]}, {Values: [2023, 05, 07, 14]}, {Values: [2023, 05, 11, 00]}, {Values: [2023, 05, 09, 06]}, {Values: [2023, 05, 09, 20]}, {Values: [2023, 05, 10, 12]}, {Values: [2023, 05, 08, 12]}, {Values: [2023, 05, 09, 14]}, {Values: [2023, 05, 09, 00]}, {Values: [2023, 05, 08, 07]}, {Values: [2023, 05, 08, 21]}, {Values: [2023, 05, 09, 04]}, {Values: [2023, 05, 08, 02]}, {Values: [2023, 05, 08, 01]}, {Values: [2023, 05, 08, 23]}, {Values: [2023, 05, 09, 23]}, {Values: [2023, 05, 10, 19]}, {Values: [2023, 05, 07, 19]}, {Values: [2023, 05, 08, 20]}, {Values: [2023, 05, 10, 17]}, {Values: [2023, 05, 10, 20]}, {Values: [2023, 05, 10, 06]}, {Values: [2023, 05, 07, 18]}, {Values: [2023, 05, 09, 07]}, {Values: [2023, 05, 10, 07]}, {Values: [2023, 05, 07, 21]}, {Values: [2023, 05, 08, 17]}, {Values: [2023, 05, 10, 01]}, {Values: [2023, 05, 10, 15]}, {Values: [2023, 05, 10, 22]}, {Values: [2023, 05, 08, 16]}, {Values: [2023, 05, 09, 09]}, {Values: [2023, 05, 07, 08]}, {Values: [2023, 05, 09, 08]}, {Values: [2023, 05, 09, 21]}, {Values: [2023, 05, 07, 23]}, {Values: [2023, 05, 10, 18]}, {Values: [2023, 05, 11, 05]}, {Values: [2023, 05, 07, 22]}, {Values: [2023, 05, 08, 00]}, {Values: [2023, 05, 08, 06]}, {Values: [2023, 05, 11, 01]}, {Values: [2023, 05, 07, 11]}, {Values: [2023, 05, 08, 03]}, {Values: [2023, 05, 09, 02]}, {Values: [2023, 05, 11, 07]}, {Values: [2023, 05, 07, 20]}, {Values: [2023, 05, 10, 00]}, {Values: [2023, 05, 11, 08]}, {Values: [2023, 05, 08, 14]}, {Values: [2023, 05, 11, 02]}, {Values: [2023, 05, 08, 09]}, {Values: [2023, 05, 10, 02]}, {Values: [2023, 05, 09, 13]}, {Values: [2023, 05, 11, 03]}, {Values: [2023, 05, 08, 22]}, {Values: [2023, 05, 10, 08]}, {Values: [2023, 05, 08, 19]}, {Values: [2023, 05, 09, 10]}, {Values: [2023, 05, 10, 14]}, {Values: [2023, 05, 09, 19]}, {Values: [2023, 05, 11, 04]}] at io.trino.plugin.hive.metastore.glue.GlueHiveMetastore.batchGetPartition(GlueHiveMetastore.java:939) at io.trino.plugin.hive.metastore.glue.GlueHiveMetastore.getPartitionsByNamesInternal(GlueHiveMetastore.java:893) at io.trino.plugin.hive.metastore.glue.GlueHiveMetastore.lambda$getPartitionsByNames$28(GlueHiveMetastore.java:883) at io.trino.plugin.hive.aws.AwsApiCallStats.call(AwsApiCallStats.java:37) at io.trino.plugin.hive.metastore.glue.GlueHiveMetastore.getPartitionsByNames(GlueHiveMetastore.java:883) at io.trino.plugin.hive.metastore.ForwardingHiveMetastore.getPartitionsByNames(ForwardingHiveMetastore.java:247) at io.trino.plugin.hive.aws.athena.PartitionProjectionMetastoreDecorator$PartitionProjectionMetastore.getPartitionsByNames(PartitionProjectionMetastoreDecorator.java:89) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.loadPartitionsByNames(CachingHiveMetastore.java:732) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore$1.loadAll(CachingHiveMetastore.java:1076) at io.trino.collect.cache.EvictableCache$TokenCacheLoader.loadAll(EvictableCache.java:463) at com.google.common.cache.LocalCache.loadAll(LocalCache.java:4073) at com.google.common.cache.LocalCache.getAll(LocalCache.java:4036) at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(LocalCache.java:4964) at io.trino.collect.cache.EvictableCache.getAll(EvictableCache.java:202) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.getAll(CachingHiveMetastore.java:254) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.getPartitionsByNames(CachingHiveMetastore.java:696) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.loadPartitionsByNames(CachingHiveMetastore.java:732) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore$1.loadAll(CachingHiveMetastore.java:1076) at io.trino.collect.cache.EvictableCache$TokenCacheLoader.loadAll(EvictableCache.java:463) at com.google.common.cache.LocalCache.loadAll(LocalCache.java:4073) at com.google.common.cache.LocalCache.getAll(LocalCache.java:4036) at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(LocalCache.java:4964) at io.trino.collect.cache.EvictableCache.getAll(EvictableCache.java:202) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.getAll(CachingHiveMetastore.java:254) at io.trino.plugin.hive.metastore.cache.CachingHiveMetastore.getPartitionsByNames(CachingHiveMetastore.java:696) at io.trino.plugin.hive.HiveMetastoreClosure.lambda$getPartitionsByNames$4(HiveMetastoreClosure.java:238) at java.base/java.util.Optional.map(Optional.java:260) at io.trino.plugin.hive.HiveMetastoreClosure.getPartitionsByNames(HiveMetastoreClosure.java:238) at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.getPartitionsByNames(SemiTransactionalHiveMetastore.java:1002) at io.trino.plugin.hive.HiveSplitManager.lambda$getPartitionMetadata$6(HiveSplitManager.java:533) at com.google.common.collect.Iterators$6.transform(Iterators.java:829) at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:52) at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1856) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:169) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298) at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681) at io.trino.plugin.hive.ConcurrentLazyQueue.isEmpty(ConcurrentLazyQueue.java:34) at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:380) at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:297) at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) at io.trino.$gen.Trino_403_amzn_0____20230606_135704_2.run(Unknown Source) at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833)
Observed on CI on version 380
The responsible line is here: https://github.com/trinodb/trino/blob/96a8f775f8763941deb9f3d2fb999d3a88015113/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java#L887-L890
Follows #10696