Open tdcmeehan opened 1 year ago
@tdcmeehan Wanted to confirm if this is about using supporting Glue as metastore for Iceberg Table? If yes, then that's already supported - https://prestodb.io/docs/current/connector/iceberg.html#glue-catalog
@agrawalreetika As far as I can tell that uses the the HiveTableOperations
backed by glue. Instead of the GlueTableOperations
of iceberg.
@tdcmeehan I think the priority of this might be higher since I have tables that take very long to get passed the planning stage (~10 minutes) while in Trino, using the native GlueCatalog, I don't have this issue at all. I think this is related to using the HiveTableOperations
.
Actually, the slowness seems to be from not caching the tables used during the query. Trino caches the TableMetadata
object and when getting re-uses it throughout the query. For tables with a large TableMetadata
this cache is really important. Changing the planning phase from minutes to seconds.
CC: @agrawalreetika who has been looking into table metadata caching and I believe has a prototype and observed a similar speedup
@tdcmeehan @jasonf20 Yes I have had similar findings where repetitive Metadata callls are causing higher planning time and eventually high Query execution time. Specially this is causing slower query execution on Iceberg Native catalogs. I have a prototype which could help us in reducing these metadata calls per query and which would drastically reduce Query execution time for both Hive & Native Iceberg catalogs. I will create a PR for same and list all the details soon.
Iceberg deployments may use multiple catalogs. We should add support for additional catalogs so users don't need to migrate their catalog to begin using Presto on Iceberg.
This issue tracks the implementation of the currently missing Glue catalog.