trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Support Iceberg's All_Entries Metadata table #10882

Open osscm opened 2 years ago

osscm commented 2 years ago

Spark supports iceberg's All_* metadata tables. This issue is to add the All_Entries metadata table. We can create separate issues for different tables.

reference: https://github.com/apache/iceberg/blob/e146d812f251f1ee5b54edd7dc696034c5ff75f4/core/src/main/java/org/apache/iceberg/MetadataTableUtils.java#L71

Also wondering if we can also think of reusing Metadata table classes that Iceberg has like: AllEntries instead of doing it doing it in the Trino APIs.

osscm commented 2 years ago

cc @findepi @RussellSpitzer

findepi commented 2 years ago

This probably would map to $all_manifest_entries.

@osscm what would be the use-case?

osscm commented 2 years ago

@findepi All_Entries exposes all the entries of Manifest file, including the valid and deleted data files in the table manifests. All_Datafiles exposes valid data files in the table manifests.

So, I feel these tables are very handy... these tables can typically be used to understand the data laid out in the iceberg table, including size, location. Its also handy when user is trying to understand the split planning and parallelism. And in cases if query does not return data, which it should be, then can find individual data files and even validate it manually.