trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.93k forks source link

Azure throttling issues on version 452 using native fs support #22915

Open mgorbatenko opened 1 month ago

mgorbatenko commented 1 month ago

Earlier this week, we updated a production Trino cluster from version 451 to 453. When the cluster experienced some heavy load, we noticed a significant amount of query failures (thousands). Queries would fail intermittently with something along the lines of HIVE_CURSOR_ERROR: Failed to read Parquet file ... or HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split .... Query failures were random, and would sometimes fail again or succeed on subsequent retries.

This cluster uses Azure native filesystem support, and we noticed we were getting many throttling and network errors on our underlying storage accounts. Rolling the version back to 451 solved this issue. We have reason to believe this PR is the culprit. With our nodes, this increased the number of concurrent connections per host from 5 to 61. With a large number of nodes, this is a huge jump.

Increasing the number of concurrent connections to Azure is likely to increase performance but it's at risk of causing throttling issues when there are many concurrent tasks and queries running.

Is this something we could make configurable by users? Is it worth considering how errors from Azure are retried?

nineinchnick commented 1 month ago

We can definitely make this configurable if there's no good default value. Can you post full stack traces from the failed queries?

mgorbatenko commented 1 month ago

We can definitely make this configurable if there's no good default value. Can you post full stack traces from the failed queries?

@nineinchnick sure! Here is a stack trace for HIVE_CANNOT_OPEN_SPLIT which was much more common:

io.trino.spi.TrinoException: Error opening Hive split abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3jbpJKqf1URkptz/27C/part-00353-15528228-6432-4a1f-9652-6ca67ab524ec-c000.snappy.parquet (offset=0, length=15253765): Error reading file file: abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3jbpJKqf1URkptz/27C/part-00353-15528228-6432-4a1f-9652-6ca67ab524ec-c000.snappy.parquet
    at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:306)
    at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:180)
    at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:200)
    at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:136)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
    at io.trino.split.PageSourceManager$PageSourceProviderInstance.createPageSource(PageSourceManager.java:79)
    at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:265)
    at io.trino.operator.Driver.processInternal(Driver.java:403)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
    at io.trino.operator.Driver.tryWithLock(Driver.java:709)
    at io.trino.operator.Driver.process(Driver.java:298)
    at io.trino.operator.Driver.processForDuration(Driver.java:269)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
    at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
    at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
    at io.trino.$gen.Trino_451____20240801_143631_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.io.IOException: Error reading file file: abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3jbpJKqf1URkptz/27C/part-00353-15528228-6432-4a1f-9652-6ca67ab524ec-c000.snappy.parquet
    at io.trino.filesystem.azure.AzureUtils.handleAzureException(AzureUtils.java:37)
    at io.trino.filesystem.azure.AzureInput.readTail(AzureInput.java:100)
    at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43)
    at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81)
    at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
    at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81)
    at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readTailInternal(TrinoParquetDataSource.java:54)
    at io.trino.parquet.AbstractParquetDataSource.readTail(AbstractParquetDataSource.java:100)
    at io.trino.parquet.reader.MetadataReader.readFooter(MetadataReader.java:100)
    at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:226)
    ... 18 more
Caused by: java.lang.IllegalStateException: Unbalanced enter/exit
    at okio.AsyncTimeout.enter(AsyncTimeout.kt:58)
    at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:384)
    at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:434)
    at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:327)
    at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29)
    at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:180)
    at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
    at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
    at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
    at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
    ... 3 more
    Suppressed: java.lang.Exception: #block terminated with an error
        at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104)
        at reactor.core.publisher.Mono.block(Mono.java:1779)
        at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:393)
        at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:323)
        at io.trino.filesystem.azure.AzureInput.readTail(AzureInput.java:95)
        at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:43)
        at io.trino.filesystem.tracing.TracingInput.lambda$readTail$3(TracingInput.java:81)
        at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
        at io.trino.filesystem.tracing.TracingInput.readTail(TracingInput.java:81)
        at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readTailInternal(TrinoParquetDataSource.java:54)
        at io.trino.parquet.AbstractParquetDataSource.readTail(AbstractParquetDataSource.java:100)
        at io.trino.parquet.reader.MetadataReader.readFooter(MetadataReader.java:100)
        at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:226)
        at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:180)
        at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:200)
        at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:136)
        at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
        at io.trino.split.PageSourceManager$PageSourceProviderInstance.createPageSource(PageSourceManager.java:79)
        at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:265)
        at io.trino.operator.Driver.processInternal(Driver.java:403)
        at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
        at io.trino.operator.Driver.tryWithLock(Driver.java:709)
        at io.trino.operator.Driver.process(Driver.java:298)
        at io.trino.operator.Driver.processForDuration(Driver.java:269)
        at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
        at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
        at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
        at io.trino.$gen.Trino_451____20240801_143631_2.run(Unknown Source)
        ... 3 more

And here is a stack trace from HIVE_CURSOR_ERROR:

io.trino.spi.TrinoException: Failed to read Parquet file: abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3JfbVaQ7DU4R4ev/zCj/part-02838-e3c55bb7-72a3-4353-83cc-eb81c9d89e0d-c000.snappy.parquet
    at io.trino.plugin.hive.parquet.ParquetPageSource.handleException(ParquetPageSource.java:215)
    at io.trino.plugin.hive.parquet.ParquetPageSourceFactory.lambda$createPageSource$1(ParquetPageSourceFactory.java:283)
    at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:75)
    at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:312)
    at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:291)
    at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:186)
    at io.trino.spi.Page.getLoadedPage(Page.java:238)
    at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:271)
    at io.trino.operator.Driver.processInternal(Driver.java:403)
    at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
    at io.trino.operator.Driver.tryWithLock(Driver.java:709)
    at io.trino.operator.Driver.process(Driver.java:298)
    at io.trino.operator.Driver.processForDuration(Driver.java:269)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
    at io.trino.execution.executor.timesharing.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
    at io.trino.execution.executor.timesharing.TimeSharingTaskExecutor$TaskRunner.run(TimeSharingTaskExecutor.java:565)
    at io.trino.$gen.Trino_451____20240801_212619_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.io.UncheckedIOException: java.io.IOException: Error reading file file: abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3JfbVaQ7DU4R4ev/zCj/part-02838-e3c55bb7-72a3-4353-83cc-eb81c9d89e0d-c000.snappy.parquet
    at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:34)
    at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149)
    at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93)
    at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:170)
    at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:100)
    at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:632)
    at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:532)
    at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:188)
    at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1003)
    at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:995)
    at org.apache.parquet.format.PageHeader.read(PageHeader.java:870)
    at org.apache.parquet.format.Util.read(Util.java:390)
    at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
    at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
    at io.trino.parquet.reader.ParquetColumnChunkIterator.readPageHeader(ParquetColumnChunkIterator.java:116)
    at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:83)
    at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:42)
    at com.google.common.collect.Iterators$PeekingImpl.peek(Iterators.java:1244)
    at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:160)
    at io.trino.parquet.reader.AbstractColumnReader.setPageReader(AbstractColumnReader.java:79)
    at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:459)
    at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:554)
    at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:537)
    at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:251)
    at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
    ... 17 more
Caused by: java.io.IOException: Error reading file file: abfs://data@amperitytenantsctxk0.dfs.core.windows.net/tables/3JfbVaQ7DU4R4ev/zCj/part-02838-e3c55bb7-72a3-4353-83cc-eb81c9d89e0d-c000.snappy.parquet
    at io.trino.filesystem.azure.AzureUtils.handleAzureException(AzureUtils.java:37)
    at io.trino.filesystem.azure.AzureInput.readFully(AzureInput.java:77)
    at io.trino.filesystem.tracing.TracingInput.lambda$readFully$0(TracingInput.java:53)
    at io.trino.filesystem.tracing.Tracing.lambda$withTracing$1(Tracing.java:38)
    at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
    at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:37)
    at io.trino.filesystem.tracing.TracingInput.readFully(TracingInput.java:53)
    at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64)
    at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:122)
    at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:332)
    at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31)
    ... 41 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: arraycopy: length -8098 is negative
    at java.base/java.lang.System.arraycopy(Native Method)
    at kotlin.collections.ArraysKt___ArraysJvmKt.copyInto(_ArraysJvm.kt:892)
    at okio.Buffer.read(Buffer.kt:1105)
    at okio.Buffer.readFully(Buffer.kt:1096)
    at okio.Buffer.readByteArray(Buffer.kt:1091)
    at okio.Buffer.readString(Buffer.kt:315)
    at okio.Buffer.readUtf8(Buffer.kt:299)
    at okhttp3.HttpUrl$Companion.percentDecode$okhttp(HttpUrl.kt:1707)
    at okhttp3.HttpUrl$Companion.percentDecode$okhttp$default(HttpUrl.kt:1695)
    at okhttp3.HttpUrl$Builder.build(HttpUrl.kt:1180)
    at okhttp3.HttpUrl$Companion.get(HttpUrl.kt:1634)
    at okhttp3.Request$Builder.url(Request.kt:192)
    at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.toOkHttpRequest(OkHttpAsyncHttpClient.java:148)
    at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$0(OkHttpAsyncHttpClient.java:109)
    at reactor.core.publisher.MonoCallable$MonoCallableSubscription.request(MonoCallable.java:137)
    at reactor.core.publisher.LambdaMonoSubscriber.onSubscribe(LambdaMonoSubscriber.java:121)
    at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:48)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4568)
    at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4534)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4470)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4442)
    at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$2(OkHttpAsyncHttpClient.java:109)
    at reactor.core.publisher.MonoCreate$DefaultMonoSink.onRequest(MonoCreate.java:225)
    at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$3(OkHttpAsyncHttpClient.java:96)
    at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:61)
    at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241)
    at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:155)
    at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
    at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
    at reactor.core.publisher.FluxRepeatWhen$RepeatWhenMainSubscriber.onNext(FluxRepeatWhen.java:143)
    at reactor.core.publisher.MonoUsing$MonoUsingSubscriber.onNext(MonoUsing.java:231)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
    at reactor.core.publisher.MonoMaterialize$MaterializeSubscriber.drain(MonoMaterialize.java:133)
    at reactor.core.publisher.MonoMaterialize$MaterializeSubscriber.onComplete(MonoMaterialize.java:127)
    at reactor.core.publisher.Operators.complete(Operators.java:137)
    at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46)
    at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
    at reactor.core.publisher.MonoUsing.subscribe(MonoUsing.java:102)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.MonoFromFluxOperator.subscribe(MonoFromFluxOperator.java:83)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4568)
    at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:265)
    at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
    at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4568)
    at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:205)
    at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
    at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:63)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4568)
    at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:205)
    at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4552)
    at reactor.core.publisher.Mono.block(Mono.java:1778)
    at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:393)
    at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:323)
    at io.trino.filesystem.azure.AzureInput.readFully(AzureInput.java:65)
    ... 50 more
    Suppressed: java.lang.Exception: #block terminated with an error
        at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104)
        at reactor.core.publisher.Mono.block(Mono.java:1779)
        ... 53 more
raunaqmorarka commented 1 month ago

Ideally we should have automatic backoff and retry (did the legacy FS already do this ?) rather than having to tune config. But it's strange that the error stacktrace doesn't show anything related to throttling.

nineinchnick commented 1 month ago

I don't see anything relevant to throttling or retries in those stack traces.

I've recently got a deep understanding how the AWS SDK does retries. Now I'm trying to find some good references for the Azure SDK, but so far I'm not seeing anything useful.

The Other details section in the overview page mentions this:

We're currently updating the Azure SDK for Java libraries to share common cloud patterns such as authentication protocols, logging, tracing, transport protocols, buffered responses, and retries.

Retries are only briefly mentioned on the HTTP clients and pipelines page, but again, without much detail.

The GitHub Wiki is not useful either, nor are the READMEs.

Adding more config options should be the last resort, so I'm trying to do the due diligence first. Maybe someone else would have some leads.

nineinchnick commented 1 month ago

I'll go and read about retry policies and throttling here: https://hadoop.apache.org/docs/stable/hadoop-azure/abfs.html#IO_Options

nineinchnick commented 1 month ago

We don't configure any retry policy, but according to the SDK sources, the defaults are:

The legacy FS seems to have implemented some custom client-side throttling. Before we attempt to replicate this in native FS, I'd need some more details on how the issue manifests.

Until then, maybe we should add a config option for the max number of connections. @electrum WDYT?

electrum commented 1 month ago

Adding a config for max connections sounds good. I think that the Hadoop implementation is a completely custom client, originally contributed by Microsoft, so it's not exactly "client-side" in that it doesn't use the Azure SDK at all. But overall it is very different than the Azure SDK, so we might need different settings to match its behavior.

steveloughran commented 1 month ago

Ideally we should have automatic backoff and retry (did the legacy FS already do this ?) rather than having to tune config.

If by "legacy connector" you mean the "hadoop abfs connector" then the answer is: of course we did, including

you should look at our code for handling 100-continue responses in particular

mgorbatenko commented 1 month ago

Ideally we should have automatic backoff and retry (did the legacy FS already do this ?) rather than having to tune config.

If by "legacy connector" you mean the "hadoop abfs connector" then the answer is: of course we did, including

  • backoff with jitter
  • statistics collection
  • rate limiting certain operations to avoid creating the problem
  • pre-emptive load reduction based on recent operations

you should look at our code for handling 100-continue responses in particular

This would all be really helpful (likely necessary) if it's included in the native fs. Especially if the plan is to EOL the legacy fs support by end of year (I think I read that somewhere on slack).

Appreciate the support here.

findinpath commented 1 month ago

I think that the Hadoop implementation is a completely custom client, originally contributed by Microsoft

cc @Riddle4045 @georgewfisher do you maybe have extra insights with regards to the Azure Hadoop implementation?