prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.99k stars 5.36k forks source link

Runtime Metrics for skewed key of a join #21469

Open jaystarshot opened 10 months ago

jaystarshot commented 10 months ago

It would be great if we could add a runtime metric which will output skewed keys of joins. Maybe we could use count min sketch or related datastructures in the lookup join operator itself to detect skew This skew detection could then be integrated with HBO

Expected Behavior or Use Case

Presto Component, Service, or Connector

Possible Implementation

Example Screenshots (if appropriate):

Context

Akanksha-kedia commented 10 months ago

@jaystarshot, for the metrics which will output skewed keys of joins, presto generated variables are we gonna use ? and how this metrics would be useful in terms of optimization.

jaystarshot commented 4 months ago

Not sure how to do a impact analysis in our production on how big of a problem this is. Maybe HBO can help in the analysis but I don't see a easy way