trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.18k stars 2.93k forks source link

SortingFileWriter Upload failing in writeTempFile on GCS #21068

Open BalaMahesh opened 6 months ago

BalaMahesh commented 6 months ago

we are seeing this issue while running the insert query into bucketed table with the session properties,

-scale_writers=true 
- task_writer_count=2 
- hive.temporary_staging_directory_enabled=False

with

hive.temporary_staging_directory_enabled=True 

we are facing the issue mentioned here https://github.com/trinodb/trino/issues/10450 .

We have raised this as a gcs issue with gcp, but they are asking to reproduce this issue with code snippet and this issue is intermittent, eventually query succeeds after many retries which is impacting SLA. Hence I am reaching out the community for help.

Trino version : 402 gcs connector jar version : gcs-connector-hadoop2-2.2.8-shaded.jar

This is the stack trace of the error.

2024-03-14T02:54:31.348Z    ERROR   stage-scheduler io.trino.execution.StageStateMachine    Stage 20240314_024953_00183_xx8k5.1 failed
io.trino.spi.TrinoException: Failed to write temporary file: gs://gcs-005/user/hive/warehouse/cube_tmp.db/day_bucketed_orc__dbt_tmp/process_date=2024-03-13/.tmp-sort.000033_0_31aa1715-b993-46c2-9230-58e6527c6191_20240314_024953_00183_xx8k5.70
    at io.trino.plugin.hive.SortingFileWriter.writeTempFile(SortingFileWriter.java:267)
    at io.trino.plugin.hive.SortingFileWriter.flushToTempFile(SortingFileWriter.java:202)
    at io.trino.plugin.hive.SortingFileWriter.appendRows(SortingFileWriter.java:126)
    at io.trino.plugin.hive.HiveWriter.append(HiveWriter.java:85)
    at io.trino.plugin.hive.HivePageSink.writePage(HivePageSink.java:346)
    at io.trino.plugin.hive.HivePageSink.doAppend(HivePageSink.java:298)
    at io.trino.plugin.hive.HivePageSink.lambda$appendPage$3(HivePageSink.java:284)
    at io.trino.hdfs.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:26)
    at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
    at io.trino.hdfs.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:25)
    at io.trino.hdfs.HdfsEnvironment.doAs(HdfsEnvironment.java:98)
    at io.trino.plugin.hive.HivePageSink.appendPage(HivePageSink.java:284)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.appendPage(ClassLoaderSafeConnectorPageSink.java:69)
    at io.trino.operator.TableWriterOperator.addInput(TableWriterOperator.java:258)
    at io.trino.operator.Driver.processInternal(Driver.java:416)
    at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
    at io.trino.operator.Driver.tryWithLock(Driver.java:706)
    at io.trino.operator.Driver.process(Driver.java:306)
    at io.trino.operator.Driver.processForDuration(Driver.java:277)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:739)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
    at io.trino.$gen.Trino_402____20240314_000358_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: Upload failed for 'gs://gcs-005/user/hive/warehouse/cube_tmp.db/day_bucketed_orc__dbt_tmp/process_date=2024-03-13/.tmp-sort.000033_0_31aa1715-b993-46c2-9230-58e6527c6191_20240314_024953_00183_xx8k5.70'
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(BaseAbstractGoogleAsyncWriteChannel.java:260)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.close(BaseAbstractGoogleAsyncWriteChannel.java:168)
    at java.base/java.nio.channels.Channels$1.close(Channels.java:176)
    at java.base/java.io.FilterOutputStream.close(FilterOutputStream.java:188)
    at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.close(GoogleHadoopOutputStream.java:119)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
    at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
    at io.airlift.slice.OutputStreamSliceOutput.close(OutputStreamSliceOutput.java:86)
    at io.trino.orc.OutputStreamOrcDataSink.close(OutputStreamOrcDataSink.java:61)
    at io.trino.orc.OrcWriter.close(OrcWriter.java:470)
    at io.trino.plugin.hive.util.TempFileWriter.close(TempFileWriter.java:61)
    at io.trino.plugin.hive.SortingFileWriter.writeTempFile(SortingFileWriter.java:262)
    ... 25 more
Caused by: java.net.SocketException: Connection reset
    at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323)
    at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
    at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
    at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
    at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478)
    at java.base/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472)
    at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70)
    at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1460)
    at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1064)
    at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
    at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
    at java.base/sun.net.www.MeteredStream.read(MeteredStream.java:141)
    at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132)
    at java.base/sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3678)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:164)
    at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
    at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:343)
    at com.google.cloud.hadoop.repackaged.gcs.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._loadMore(UTF8StreamJsonParser.java:257)
    at com.google.cloud.hadoop.repackaged.gcs.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._loadMoreGuaranteed(UTF8StreamJsonParser.java:2444)
    at com.google.cloud.hadoop.repackaged.gcs.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2527)
    at com.google.cloud.hadoop.repackaged.gcs.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2507)
    at com.google.cloud.hadoop.repackaged.gcs.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:334)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.jackson2.JacksonParser.getText(JacksonParser.java:74)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonParser.parseValue(JsonParser.java:856)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonParser.parse(JsonParser.java:451)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonParser.parseValue(JsonParser.java:787)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonParser.parse(JsonParser.java:360)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonParser.parse(JsonParser.java:335)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:79)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:73)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpResponse.parseAs(HttpResponse.java:461)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:565)
    at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:85)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    ... 3 more
findinpath commented 6 months ago

Try out the Trino native file system client for GCS communication on the most recent Trino release - we've constantly improving it.

fs.hadoop.enabled=false
fs.native-gcs.enabled=true
gcs.json-key-file-path=/tmp/gcs.credentials.json
findinpath commented 6 months ago

cc @elonazoulay

BalaMahesh commented 5 months ago

Try out the Trino native file system client for GCS communication on the most recent Trino release - we've constantly improving it.

fs.hadoop.enabled=false
fs.native-gcs.enabled=true
gcs.json-key-file-path=/tmp/gcs.credentials.json

we have tried updating our Trino version recently to 433, but we had to rollback because of few issues it was giving with struct type columns. We will try this when we are stable on latest version. Meanwhile, is there any config tuning we can do to fix this issue.