trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.43k stars 3k forks source link

SortingFileWriter failing to committing write to Hive with less workers #10450

Open BalaMahesh opened 2 years ago

BalaMahesh commented 2 years ago

trino version : 366 Issue : We are running INSERT query which writes data into the bucketed hive table in orc format. In hive connector we have set the below config

hive.temporary-staging-directory-enabled = true hive.temporary-staging-directory-path = /data0/logs/trino (changed from /tmp/presto-${USER} , since we are facing this and wanted to try with different path).

Case 1: With 70 workers in the cluster, this query is able to finish successfully and writes data into hive table in orc format. In this case we could see the tmp files created in the worker nodes and once the query is finished, temp files get deleted. Attached the screenshot of temp files with names from one of the worker nodes.

Screenshot 2022-01-04 at 8 56 40 AM

Case 2: With 60 workers in the cluster and the same hive connector configuration, the query is failing with the below error from one of the worker nodes. I could see the tmp files created for the query in other worker nodes(ssh ing to all the worker nodes at the same time is not easy) and they are deleted after the query is failed.

io.trino.spi.TrinoException: Error committing write to Hive
    at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:147)
    at io.trino.plugin.hive.HiveWriter.commit(HiveWriter.java:86)
    at io.trino.plugin.hive.HivePageSink.closeWriter(HivePageSink.java:331)
    at io.trino.plugin.hive.HivePageSink.doFinish(HivePageSink.java:197)
    at io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
    at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97)
    at io.trino.plugin.hive.HivePageSink.finish(HivePageSink.java:190)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSink.finish(ClassLoaderSafeConnectorPageSink.java:77)
    at io.trino.operator.TableWriterOperator.finish(TableWriterOperator.java:221)
    at io.trino.operator.Driver.processInternal(Driver.java:406)
    at io.trino.operator.Driver.lambda$processFor$9(Driver.java:292)
    at io.trino.operator.Driver.tryWithLock(Driver.java:685)
    at io.trino.operator.Driver.processFor(Driver.java:285)
    at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
    at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:488)
    at io.trino.$gen.Trino_366____20220103_191231_2.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.UncheckedIOException: java.io.FileNotFoundException: File /data0/logs/trino/.tmp-sort.hour=22.000049_0_20220104_022754_00139_2f9b9.1 does not exist
    at io.trino.plugin.hive.SortingFileWriter.mergeFiles(SortingFileWriter.java:236)
    at io.trino.plugin.hive.SortingFileWriter.writeSorted(SortingFileWriter.java:192)
    at io.trino.plugin.hive.SortingFileWriter.commit(SortingFileWriter.java:143)
    ... 19 more
Caused by: java.io.FileNotFoundException: File /data0/logs/trino/.tmp-sort.hour=22.000049_0_20220104_022754_00139_2f9b9.1 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
    at io.trino.plugin.hive.SortingFileWriter.mergeFiles(SortingFileWriter.java:217)
    ... 21 more

Can someone suggest the possible reason and fix for this issue. Thanks in advance.

BalaMahesh commented 2 years ago

@findepi any idea why this is happening , it's strange no else has faced this issue earlier ?

findepi commented 2 years ago

i do not, but i'd ask @electrum for advice.

BalaMahesh commented 2 years ago

@electrum can you please take a look and suggest on this.

BalaMahesh commented 2 years ago

@findepi ,

@electrum seems busy.

can you please add any others who can help with this.

BalaMahesh commented 2 years ago

@electrum can you please guide on this.

sambhav8695 commented 2 years ago

@findepi @electrum I am also facing the same issue. Can you please help here?

sopel39 commented 2 years ago

Additional, relevant slack thread https://trinodb.slack.com/archives/CGB0QHWSW/p1653301299029359