prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.97k stars 5.35k forks source link

Add metrics to measure IO efficiency #16585

Open aweisberg opened 3 years ago

aweisberg commented 3 years ago

We want to be able to loosely evaluate the IO efficiency of any query and need to come up with some metrics that could help with that.

There are two metrics I have come up with:

wasted_projection_bytes Bytes read during query execution from/by a datasource that are discarded as part of projecting out just the columns we want. This would be an additional statistic in the query completed event.

This is intended to determine if a query could have been more efficient if it was able to leverage a materialized view, better column ordering, or some other layout change to scan less data.

wasted_filter_bytes Bytes read from/by a datasource (including projected out columns) that are then filtered out by a predicate. For simplicity we say this only applies to what is pushed all the way down leaf scan nodes.

This is intended to determine if a query could have been more efficient if it was able to leverage a materialized view, secondary indexes, z-order partitioning, or some other scheme to implement the predicate in a more IO efficient way.

Operator level statistic We want to know more precisely which table being scanned caused the wasted IO so this should be collected at the operator level and reported in the operator statistics as well as summarized at top level for the entire query.

yuanzhanhku commented 3 years ago

FYI, we have a new metric tracking framework now where adding a metric will be a one-line change. And the framework automatically aggregates it at stage level and query level and displays it in the Coordinator UI, Lookup UI and stores it in scuba. Here are the instructions on how to add new metrics using the new tracking framework: https://docs.google.com/document/d/1M1LVUMoaKAwt22bLTP6o-SAhX8rahhTwP0SzC-L29VU/edit#

nmahadevuni commented 3 years ago

FYI, we have a new metric tracking framework now where adding a metric will be a one-line change. And the framework automatically aggregates it at stage level and query level and displays it in the Coordinator UI, Lookup UI and stores it in scuba. Here are the instructions on how to add new metrics using the new tracking framework: https://docs.google.com/document/d/1M1LVUMoaKAwt22bLTP6o-SAhX8rahhTwP0SzC-L29VU/edit#

@yuanzhanhku Is this not a public document?

yuanzhanhku commented 3 years ago

FYI, we have a new metric tracking framework now where adding a metric will be a one-line change. And the framework automatically aggregates it at stage level and query level and displays it in the Coordinator UI, Lookup UI and stores it in scuba. Here are the instructions on how to add new metrics using the new tracking framework: https://docs.google.com/document/d/1M1LVUMoaKAwt22bLTP6o-SAhX8rahhTwP0SzC-L29VU/edit#

@yuanzhanhku Is this not a public document?

Created a public doc and updated the link in the above comments.