Closed sriharshaj closed 1 month ago
This could be related to https://github.com/trinodb/trino/issues/23384
We will release a new Trino version this week. I'll postpone further investigation so we can check whether the fix in this new version helps.
@wendigo We tested Trino version 458, and while the planning time has improved, it remains unusually high, ranging between 30 to 50 seconds.
@sriharshaj do you know what accounts for this number? Metadata retrieval from storage? Metastore calls?
Here are the optimizer summaries.
"optimizerRulesSummaries": [
{
"rule": "io.trino.sql.planner.optimizations.AddExchanges",
"invocations": 1,
"applied": 1,
"totalTime": 5120641837,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.DetermineTableScanNodePartitioning",
"invocations": 3,
"applied": 1,
"totalTime": 4029729665,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.PushPredicateIntoTableScan",
"invocations": 1,
"applied": 1,
"totalTime": 3586457350,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.FilterExpressionRewrite",
"invocations": 58,
"applied": 1,
"totalTime": 2148574,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.ExpressionRewriteRuleSet.ProjectExpressionRewrite",
"invocations": 92,
"applied": 0,
"totalTime": 921214,
"failures": 0
},
{
"rule": "io.trino.sql.planner.optimizations.PredicatePushDown",
"invocations": 7,
"applied": 7,
"totalTime": 442238,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.PushLimitIntoTableScan",
"invocations": 4,
"applied": 1,
"totalTime": 293083,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.PruneTableScanColumns",
"invocations": 7,
"applied": 1,
"totalTime": 273248,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.PruneOutputSourceColumns",
"invocations": 16,
"applied": 1,
"totalTime": 268922,
"failures": 0
},
{
"rule": "io.trino.sql.planner.iterative.rule.PruneProjectColumns",
"invocations": 7,
"applied": 4,
"totalTime": 224337,
"failures": 0
}
],
Is there a way to analyze why the optimizers are taking so long? Additionally, where can I find details on metadata retrieval and Metastore calls?
@sriharshaj You can enable tracing (opentelemetry) and capture what cluster is doing
@wendigo Can I capture those metrics with JMX? We don't have opentelemetry setup.
@sriharshaj jmx keeps aggregates, not individual events
@wendigo I installed Trino locally and ran the same query. The planning phase took approximately 15 seconds.
The ConnectorMetadata.getTableProperties
method is taking around 1.5 to 2.0 seconds to retrieve the table metadata.
During query optimization in Trino, metadata is being fetched five times, and during the fragment generation phase, it’s being retrieved three additional times.
This issue occurs exclusively with Iceberg queries. For Hive, everything works as expected.
@sriharshaj I believe that @raunaqmorarka added recently some cache for metadata files.
What's the version you are using? @sriharshaj
@wendigo We are facing this issue since 451.
I traced down the issue to this change: https://github.com/trinodb/trino/pull/15712/files#diff-e1cb17efec6787989f9df9ee40c4f2809ff3fe946cd2ec721ff8932b131997b8R618.
Our schema contains a large number of nested fields, which results in all columns being mapped to IcebergColumnHandle
. When I debugged a specific table, it was mapping approximately 2,260 nested columns to IcebergColumnHandle
, which can be significantly impacting the planning.
@krvikash @raunaqmorarka can you take a look?
Thank you, @wendigo, for your guidance in helping me identify the issue.
@krvikash @raunaqmorarka Any updates?
Hi @sriharshaj, I did not get a chance to looked into yet. I will take a look in this week.
Hi @sriharshaj, opened https://github.com/trinodb/trino/pull/23586 to load only required columns in the map. If you have a way to try this change before merge and see if this fix reduce the planning time for your query.
@krvikash Sure, I will try this fix today.
@krvikash I tried the fix, it worked. Planning time is in milliseconds. Thank you for the fix.
Great!! Thanks @sriharshaj for reporting this and verifying the fix.
Our Iceberg queries are getting stuck in the planning phase for 2 to 3 minutes, although they eventually run successfully. We are currently upgrading Trino from version 444 to 454. This issue has been occurring since Trino version 451 (we went back and tried different versions).
Our Iceberg stack --> Storage: s3 File format: parquet Catalog: Hive
Query explain:
Query info: https://gist.github.com/sriharshaj/f26e655f233b84754da8216be2ae0172