Closed findinpath closed 6 months ago
It's labeled as correctness
not fully correctly. It's more "potential table corruption" than "incorrect query results". We don't have a label for table corruption though.
I labeled this as a "release-blocker" although the problematic code exists since Trino 429, so feel free to unlabel if needed.
cc @alexjo2144 @electrum
Removed release-blocker
label from the issue because it currently affects the native GCS file system (GcsOutputFile
) which is by default disabled (fs.native-gcs.enabled
is by default false
).
In the following concurrency scenario on the Delta Lake connector backed by the GCS object storage:
it may happen the following exception:
This is linked to the fact that the transaction log file is created without content (while using the native GCS file system)
https://github.com/trinodb/trino/blob/33dd20a8c104d358f5b6d38e8a0405f8b0ced944/lib/trino-filesystem-gcs/src/main/java/io/trino/filesystem/gcs/GcsOutputFile.java#L84-L86
The content of the file is added right after creating the file, but this operation may potentially fail - leading to corrupting the Delta Lake table.
In any case, disregarding the more serious problem of permanent corruption of the table which may occur in case of dealing with IO exception while writing the content, the SELECT operation may stumble on a seemingly corrupt Delta Lake table - which is a temporary corruption of the table.
Implementation which is functioning correctly https://github.com/trinodb/trino/blob/33dd20a8c104d358f5b6d38e8a0405f8b0ced944/lib/trino-hdfs/src/main/java/io/trino/hdfs/gcs/GcsExclusiveOutputStream.java