Open ebyhr opened 2 weeks ago
delta_dv_incorrect_result.zip returns the different results between Trino and Spark. Spark returns incorrect results while Trino returns correct results since transaction version 6.
Spark writes only commitInfo
and remove
entries if we redo the same DELETE FROM table WHERE a = 2
at version 6.
{"commitInfo":{"timestamp":1725257090082,"operation":"DELETE","operationParameters":{"predicate":"[\"(a#4634 = 1)\"]"},"readVersion":5,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"1","numRemovedBytes":"0","numCopiedRows":"0","numDeletionVectorsAdded":"0","numDeletionVectorsRemoved":"1","numAddedChangeFiles":"0","executionTimeMs":"829","numDeletionVectorsUpdated":"0","numDeletedRows":"1","scanTimeMs":"0","numAddedFiles":"0","numAddedBytes":"0","rewriteTimeMs":"0"},"engineInfo":"Apache-Spark/3.5.0 Delta-Lake/3.2.0","txnId":"99f9db95-be29-46af-9c52-8589e736336c"}}
{"remove":{"path":"Me/part-00000-ef631572-1456-4aea-b6dd-5a810330a4ed-c000.snappy.parquet","deletionTimestamp":1725257090052,"dataChange":true,"extendedFileMetadata":true,"partitionValues":{},"size":1284,"tags":{"INSERTION_TIME":"1725256234000000","MIN_INSERTION_TIME":"1725256234000000","MAX_INSERTION_TIME":"1725256234000000","OPTIMIZE_TARGET_SIZE":"268435456"},"deletionVector":{"storageType":"u","pathOrInlineDv":"gP!4Wwd7*eKUO5n%6{)^","offset":1,"sizeInBytes":34,"cardinality":1},"stats":"{\"numRecords\":2}"}}
Strangely, the pathOrInlineDv
is equivalent to the value in version 2.
https://github.com/trinodb/trino/actions/runs/10654242163/job/29530504757