trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.48k stars 3.02k forks source link

Add support for reading tables with temporal versions in Delta Lake connector #21024

Open ebyhr opened 8 months ago

ebyhr commented 8 months ago

Follow-up of https://github.com/trinodb/trino/issues/15894

We are going to support only version-id in the initial implementation.

findinpath commented 8 months ago

Take into account the delta.io RFC https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md

ebyhr commented 1 month ago

The in-commit-timestamps.md file has been moved to https://github.com/delta-io/delta/blob/master/protocol_rfcs/accepted/in-commit-timestamps.md

findinpath commented 1 month ago

Here seems to be the delta.io algorithm for translating a given timestamp to the actual transaction https://github.com/delta-io/delta/blob/7b9c4ed34dcf102ff3dca0abed3f668ebc9f3061/spark/src/main/scala/org/apache/spark/sql/delta/DeltaHistoryManager.scala#L126-L136