prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16k stars 5.36k forks source link

Add Number of Stages to Query Statistics #12136

Closed cnuss closed 5 years ago

cnuss commented 5 years ago

Greetings!

In Presto 0.208, query_max_stage_count and query.max-stage-count was introduced.

Now queries might fail with an error message of:

com.facebook.presto.spi.PrestoException: Number of stages in the query (110) exceeds the allowed maximum (100). If the query contains multiple DISTINCTs, please set the use_mark_distinct session property to false. If the query contains multiple CTEs that are referenced more than once, please create temporary table(s) for one or more of the CTEs.
    at com.facebook.presto.sql.planner.PlanFragmenter.sanityCheckFragmentedPlan(PlanFragmenter.java:111)
...

I use an event listener to capture Query Metadata using the Presto SPI to logs, however the number of stages (aka Fragments) don't appear to be set in the Query Statistics.

Could you add the number of stages/fragments as a metric in the Query Statistics of the Presto SPI?

Thanks!

nezihyigitbasi commented 5 years ago

You can extract the number of stages from the plan in the QueryMetadata. Also, just curious, how are you going to use that information?

cnuss commented 5 years ago

it looks like the plan is a generic String. What would I extract out of that for the number of stages?

I plan to use the number so I can properly tune max stages to a reasonable minimum above the default.

nezihyigitbasi commented 5 years ago

The plan should include information about the stages/fragments. Look for "Fragment N" in the plan. Another way to get the number of stages is the cpuTimeDistribution field in QueryStatistics, which include cpu time distribution per stage. The number of stages should be cpuTimeDistribution.size().

cnuss commented 5 years ago

Wow awesome, Thanks! I'll use that. And I'll close this issue.