trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Presto/Trino concurrent read failure on HIVE-ACID tables #8865

Open vinay-kl opened 2 years ago

vinay-kl commented 2 years ago

Overview we are running into reading errors while accessing HIVE-ACID tables from presto/trino, this is the error stack trace.

io.prestosql.spi.PrestoException: GET https://gen2hivebifros.dfs.core.windows.net/bifrostx-hive-data?resource=filesystem&maxResults=5000&directory=prod-data/myntra_wms.db/myntra_wms_item/part_created_on%3D202004/delta_0054078_0054078_0001&timeout=90&recursive=false
StatusCode=404
StatusDescription=The specified path does not exist.
ErrorCode=PathNotFound
ErrorMessage=The specified path does not exist.
RequestId:172573ee-a01f-002d-73b7-8eb139000000
Time:2021-08-11T13:47:43.1784236Z
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:258)
    at io.prestosql.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
    at io.prestosql.$gen.Presto_347____20210811_110824_2.run(Unknown Source)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: GET https://gen2hivebifros.dfs.core.windows.net/bifrostx-hive-data?resource=filesystem&maxResults=5000&directory=prod-data/myntra_wms.db/myntra_wms_item/part_created_on%3D202004/delta_0054078_0054078_0001&timeout=90&recursive=false
StatusCode=404
StatusDescription=The specified path does not exist.
ErrorCode=PathNotFound
ErrorMessage=The specified path does not exist.
RequestId:172573ee-a01f-002d-73b7-8eb139000000
Time:2021-08-11T13:47:43.1784236Z
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:926)
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:347)
    at org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:270)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
    at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.chooseFile(AcidUtils.java:1711)
    at org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormat(AcidUtils.java:1721)
    at org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:904)
    at org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:892)
    at org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java:1185)
    at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1019)
    at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:978)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$5(BackgroundHiveSplitLoader.java:468)
    at io.prestosql.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
    at io.prestosql.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:96)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:468)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:325)
    at io.prestosql.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:254)
    ... 6 more
Caused by: org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException: GET https://gen2hivebifros.dfs.core.windows.net/bifrostx-hive-data?resource=filesystem&maxResults=5000&directory=prod-data/myntra_wms.db/myntra_wms_item/part_created_on%3D202004/delta_0054078_0054078_0001&timeout=90&recursive=false
StatusCode=404
StatusDescription=The specified path does not exist.
ErrorCode=PathNotFound
ErrorMessage=The specified path does not exist.
RequestId:172573ee-a01f-002d-73b7-8eb139000000
Time:2021-08-11T13:47:43.1784236Z
    at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:134)
    at org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:180)
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:526)
    at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:344)
    ... 22 more

we are running compactions in parallel while the concurrent read and writes are happening, few times we are encountering the aforementioned on presto, when dug deeper it was found that the compaction has written a new base folder and cleaned up older deltas post the query started running.

The compaction was started at 2021-08-11 19:13:03 and finished by 2021-08-11 19:17:43 The query started at 2021-08-11 19:17:38 and failed by 2021-08-11 19:17:43

+---------+-------------+-----------------+------------------------+----------+---------+------------------+--------------+---------------+---------------+-----------+---------------------+--------------+-------------------------+
| CC_ID   | CC_DATABASE | CC_TABLE        | CC_PARTITION           | CC_STATE | CC_TYPE | CC_TBLPROPERTIES | CC_WORKER_ID | CC_START      | CC_END        | CC_RUN_AS | CC_HIGHEST_WRITE_ID | CC_META_INFO | CC_HADOOP_JOB_ID        |
+---------+-------------+-----------------+------------------------+----------+---------+------------------+--------------+---------------+---------------+-----------+---------------------+--------------+-------------------------+
| 1885280 | myntra_wms  | myntra_wms_item | part_created_on=202004 | s        | a       | 0:               | NULL         | 1628709183000 | 1628709463000 | hive      |               54129 | NULL         | job_1628652797770_10982 |
+---------+-------------+-----------------+------------------------+----------+---------+------------------+--------------+---------------+---------------+-----------+---------------------+--------------+-------------------------+

I believe The HMS cleaner thread will not clean up the compaction referenced or older deltas until the lock on the partition/table is released.

losipiuk commented 2 years ago

I tried to reproduce the issue but locks added by Trino during the query seems to work fine, and they prevent Hive (HDP3) from running the final step of compaction, which is deleting old data files. Could you share more about your environment? Also if you run SHOW LOCKS on Hive while query from Trino is running is the output as expected?