trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.85k stars 2.85k forks source link

Delta Lake table metadata corruption during writes on GCS when using native fs #20168

Closed findinpath closed 6 months ago

findinpath commented 6 months ago

In the following concurrency scenario on the Delta Lake connector backed by the GCS object storage:

it may happen the following exception:

Caused by: java.io.IOException: Cannot read at 0. File size is 0: gs://teset-bucket/test-delta-lake-integration-smoke-test-qs6d79mlca/smoke_test/test_create_or_replacetjz59y4whk/_delta_log/00000000000000000002.json

This is linked to the fact that the transaction log file is created without content (while using the native GCS file system)

https://github.com/trinodb/trino/blob/33dd20a8c104d358f5b6d38e8a0405f8b0ced944/lib/trino-filesystem-gcs/src/main/java/io/trino/filesystem/gcs/GcsOutputFile.java#L84-L86

The content of the file is added right after creating the file, but this operation may potentially fail - leading to corrupting the Delta Lake table.

In any case, disregarding the more serious problem of permanent corruption of the table which may occur in case of dealing with IO exception while writing the content, the SELECT operation may stumble on a seemingly corrupt Delta Lake table - which is a temporary corruption of the table.

Implementation which is functioning correctly https://github.com/trinodb/trino/blob/33dd20a8c104d358f5b6d38e8a0405f8b0ced944/lib/trino-hdfs/src/main/java/io/trino/hdfs/gcs/GcsExclusiveOutputStream.java

findepi commented 6 months ago

It's labeled as correctness not fully correctly. It's more "potential table corruption" than "incorrect query results". We don't have a label for table corruption though.

I labeled this as a "release-blocker" although the problematic code exists since Trino 429, so feel free to unlabel if needed.

cc @alexjo2144 @electrum

findinpath commented 6 months ago

Removed release-blocker label from the issue because it currently affects the native GCS file system (GcsOutputFile) which is by default disabled (fs.native-gcs.enabled is by default false).