trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.14k stars 2.92k forks source link

Improve stats reporting for group by operator #21925

Open sopel39 opened 3 months ago

sopel39 commented 3 months ago

Similarly as in https://github.com/trinodb/trino/commit/f46fd9c1dccb975bf480f8c7029daf7e45541b54 we could separately report hash lookups/updates and aggregations' accumulator updates to get more insight into query bottlenecks.

cc @dain @raunaqmorarka

sug-ghosh commented 3 months ago

I want to take this up, @sopel39 can you give more insight of this.

sopel39 commented 3 months ago

I want to take this up, @sopel39 can you give more insight of this.

Sure go ahead. Take a look at commit https://github.com/trinodb/trino/commit/f46fd9c1dccb975bf480f8c7029daf7e45541b54. You can see that ScanFilterAndProjectOperator keeps projection and fitltering stats in PageProcessorMetrics, which are then returned as operator metrics. Similar approach could be applied for group by operator. We can measure performance of GroupByHash and Accumulator separately.

sug-ghosh commented 3 months ago

okay.