Open pbelgundi opened 3 years ago
@pbelgundi while I understand the need to scale both Pravega and Bookkeeper based on the workload, I'm unsure that coupling the Pravega Operator to Bookkeeper again is the way to go. As you know, the "ancient" Pravega Operator was already managing both the Bookkeeper and Pravega services. But eventually, the decision was to split them into 2 separate operators, which makes sense for many reasons. Based on the description posted, we could at least distinguish between 2 types of scaling:
pravega-autoscaler
) could achieve this objective without adding more complexity to the existing Pravega Operator. Other reasons that can be easily seen from this approach are the following: i) The Pravega Operator would be the same for scenarios where auto-scaling is needed and scenarios in which it is not; ii) We would keep Pravega Operator and Bookkeeper Operator strictly focused on their targeted services; iii) The "auto-scaling" function will likely require to build a "feedback loop", which would lead to something like consuming metrics of the workload/resource utilization, implement some "cluster scaling policies", evaluate them continuously, and then react by scaling the cluster up and down, if necessary. As you can see, all this functionality may get complex enough to deserve its own software component running as a microservice. iv) If in the future there are other alternatives to Bookkeeper as Tier 1, the Pravega Operator would be agnostic to it, as it would be the pravega-autoscaler
the place where other Tier 1 options may get plugged in.Perhaps, tools like Kubernetes horizontal autoscaler may help us up to a great extent to achieve this objective in Kubernetes-based scenarios.
Another approach could be to implement the "autoscaling functionality" on each operator separately; that is, the Bookkeeper Operator could autoscale Bookkeeper, and the Pravega Operator could autoscale Pravega. One the one hand, this could make sense, given that these system may require to be scaled for very different reasons (Bookkeeper is often IO bound, whereas Pravega is quite often CPU bound in our performance experiments). On the other hand, this would lead to kind of "repeat" the same functionality in both Operators (which can be mitigated by sharing a common autoscale logic across both operators).
Anyway, irrespective of the possible approach to achieve dynamic scaling, my main concern is to again couple Pravega Operator with functionality related to Bookkeeper, which I think that needs to be treated separately.
Description
To be able to dynamically scale Pravega, it is crucial that pravega-operator manage both Segment Store and Bookkeeper clusters, as we would typically want to scale both up and down together.
Importance
must-have
Location
pravega-operator
Suggestions for an improvement
Change Pravega CustomResourceDefinition to include Bookkeeper. If upgrades need to be supported, then this would involve CRD conversion from v1beta1 to the next version.