Open vinay-kl opened 4 months ago
This issues seems to be related to previous work done on:
We've identified before a situation where the file was opened for being rewritten and in case of a failure while writing the new content, the file was left empty.
We may be dealing with the same thing on Azure. I see though that you're not using the native file systems, but rather hdfs. I highly recommend switching to Trino native azure client instead of HDFS.
I highly recommend switching to Trino native azure client instead of HDFS.
@findinpath we make use of both was & abfs
protocols which basically deals with gen2 and gen2 with HNS enabled. we are on the verge of making it abfs/abfss streamlined
so we can move to Oauth which is only supported with abfs[s] protocols
.
Trino is unable to query the delta tables for which the _last_checkpoint file is empty or missing and on which the older log entries have been cleaned up
We are using Trino (
v448
) itself for writing the data onto this delta table, it seems like the TRINO was able to write JSON and checkpoint file for251 table version
but the _last_checkpoint file wasn't updated.At the time of write in JAN-2024 we were using
v434 of Trino
Query and failure stack-trace
On further inspection of telemetry data, we found out that Trino is trying to read
00000000000000000000.json
which doesn't exist anymore and has been long deleted as part of log-entry cleanupsFile system listing of
_delta_log
folderSteps for re-creation purpose only, the actual issue could've happened due to other reasons
Post the above, delete the
00000000000000000000.json and _last_checkpoint files
Telemetry stack-trace
FYI,
DBR and OSS is able to read & write to the same table without any issues