stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Implement Shared AutoScaling Hook Functionality #667

Open soenkeliebau opened 1 week ago

soenkeliebau commented 1 week ago

Currently we do not really support autoscaling clusters that are managed by Stackable beyond fairly trivial scenarios.

We do support deploying HPAs for scaling StatefulSets, but as soon as soon as actions need to be run before or after scaling these, we start running into issues. This is due to the fact, that there exist no pre- or post-scale hooks that operators could tie into to

Actual example When scaling down a NiFi cluster, a node has to be decommissioned and offloaded of flowfiles before tearing down the node, otherwise any data on that node will be lost forever.

During the recent on-site in Munich I presented the current state and an idea of how this could be implemented, slides used: Autoscaling @ Stackable.pdf

The scope of this ticket is to get the fundamental work in operator-rs done, so that functionality exists that operators can call to determine if scaling is neccessary and give appropriate control to the operator to run workloads before or after actually doing the scale function. Ideally we'll also implement helper functions for the actual scaling, i.e. propagating replica count to the StatefulSets, when appropriate.

Requirements