trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.42k stars 3k forks source link

Make TaskCountEstimator dynamic #847

Closed rohangarg closed 5 years ago

rohangarg commented 5 years ago

Currently, TaskCountEstimator initializes with the number of active nodes present in the cluster at that point of time. After that, the number of active nodes is never changed during the cluster lifetime.

In cloud based deployments along with worker autoscaling, the number of nodes in a cluster can grow and shrink with time. In such setups, we can update the number of task counts dynamically according to the cluster size for better cost estimates of plans (can use ClusterSizeMonitor for getting current size).

findepi commented 5 years ago

I think TaskCountEstimator does not remember number of active nodes. It only remembers how to get the current value. What am I missing?

rohangarg commented 5 years ago

Yes, you are right - it does show current active nodes every time. There was problem in my testing setup which led me to believe that number of active nodes was also getting only computed once.