Open ryanemerson opened 3 years ago
Can we use HPA for DataGrid case? https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
It appears so :slightly_smiling_face:
Last time I tried to use the default HPA memory metrics I didn't get good results, mainly because of the java memory management. Maybe we can tune GC or investigate if HPA can be integrated with custom memory metrics.
Yeah, HPA custom metrics should support, but need to be validated https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md
A good starting point to design HPA/operator integration could be CPU autoscaling (#274): standard CPU load metric should work quite well to control pods scaling.
There's also Kubernetes Event-driven Autoscaling that has integrations with PostgreSQL and Redis.
A problem with a generic container wide approach is that it does not take into account the requirements of different cache types. Replicated and Distributed caches have very different requirements when it comes to autoscaling, therefore it's necessary for any autoscaling to be configured based upon the use-cases of the Infinispan cluster.
Here we define how different cache types affect scaling.
Vertical Scaling | Horizontal Scaling | |
---|---|---|
CPU | Allows increased read and write performance | Allows increased read performance, but results in slower writes as each additional pod needs to be included in every write operation |
Memory | Increases capacity for all pods | Doesn't make sense as all pods store all entries, so increasing the number of pods does not increase the total memory available |
Vertical Scaling | Horizontal Scaling | |
---|---|---|
CPU | Allows increased read and write performance | Does not improve CPU performance as entries are always read from entry primary or backup owners |
Memory | Increases memory capacity of the cluster | Increases memory capacity of the cluster |
Implement automatic Horizontal Scaling and require users/admins to manually perform Vertical scaling by updating the Infinispan spec.container
fields.
Automatically scaling an existing cluster vertically is tricky as it can lead to the cluster becoming unavailable due to a lack of resources. Furthermore, K8s does not provide a mechanism to vertically scale OOB.
Correct autoscaling behaviour is tightly coupled to an application's Infinispan requirements and cannot be implemented in a way that is applicable to all users. This proposal is concerned with how we can expose autoscaling configuration to the user so that they can define behaviour suitable for their use-case. A big part of this effort will be creating documentation that details what type of scaling is appropriate for different workloads.
Based upon the HorizontalPodAutoscaler.
We extend the Infinispan CRD to define the scale subresource.
The HorizontalPodAutoscaler
controller will then increase/decrease the Infinispan CR spec.replicas
field based upon the behaviour defined in the HorizontalPodAutoscaler
CR.
Utilising the autoscaling/v2beta2
api can define fine grained control of the scale up/down behaviour. For example, utilising a stabilizationWindowSeconds
to prevent excessive scaling resulting in rebalancing adversely affecting performance.
Below is an example HorizontalPodAutoscaler
definition with a custom scaleUp
definition.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-infinispan
namespace: default
spec:
scaleTargetRef:
apiVersion: v1
kind: Infinispan
name: example-infinispan
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 500Mi
behavior:
scaleUp:
stabilizationWindowSeconds: 180
policies:
- type: Pods
value: 1
periodSeconds: 120
selectPolicy: Max
The user can define scaling in one of three ways:
HorizontalPodAutoscaler
spec:
autoscale:
minReplicas: 1
maxReplicas: 10
resource:
- name: cpu
type: AverageValue | Utilization | Value
# One of the below fields must be defined depending on the configure type
averageValue: 500Mi
value: 500Mi
averageUtilisation: 50%
- name: memory
type: AverageValue | Utilization | Value
# One of the below fields must be defined depending on the configure type
averageValue: 500m
value: 500m
averageUtilisation: 50%
kubectl
kubectl autoscale infinispan example-infinispan --cpu-percent=50 --min=1 --max=10
Manually create HorizontalPodAutoscaler
@ryanemerson, overall I like the approach. I've still some concerns about the metrics though: while I consider the default CPU metric good enough, for memory I would suggest as a first step to verify if my previous comment is still true. I mean, without a good metric it's hard to control a system. I that case we could try to tune the GC, as described here or a more complex solution could be to instrument Infinispan with an ad-hoc metric.
for memory I would suggest as a first step to verify if my previous comment is still true. I mean, without a good metric it's hard to control a system.
Can you elaborate on the issues you encountered?
I'm guessing it was the JVM not releasing committed memory once it's unused?
I that case we could try to tune the GC, as described here or a more complex solution could be to instrument Infinispan with an ad-hoc metric.
I think this is an area where we would benefit from using Shenandoah
https://stackoverflow.com/questions/61506136/kubernetes-pod-memory-java-gc-logs/61512521#61512521
Can you elaborate on the issues you encountered?
I'm guessing it was the JVM not releasing committed memory once it's unused?
2. iirc: another problematic aspect is how the pods are scaled up: kubernetes doesn't start new pods one by one, instead it applies a multiplier factor https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details. This doesn't fit very well with java applications with consistent initial memory foot print. (btw I consider very aggressive this "multiplicative algorithm", maybe I'm missing something)
We can control this with the autoscaling/v2beta2
API, as it lets us control the scale up/down behaviour.
3. (minor) there's a "minimum number of nodes" in the ispn metrics below which the cluster starts to loose data. I'm not sure this can be handled via standard autoscaler
We could make the scale/up down behaviour be dictated by Infinispan itself using a custom metric that indicates when more/less memory is required, with the metric taking into account a lower bound to ensure that the cluster maintains at least the minimum number of pods.
Exposing a custom metric is more involved than using the basic memory usage and would require an enhancement on the server side. We could start with a basic memory based approach and then enhance the auto scale feature in the future as required.
Here is a quick guide on how to use custom metrics with HPA.
We could start with a basic memory based approach and then enhance the auto scale feature in the future as required.
Sounds good, though I would suggest to verify early how far can we go with basic metrics, imo the choice between basic vs ad-hoc metric could have broad impact (possibly even on features design?)
2. kubectl * `kubectl autoscale infinispan example-infinispan --cpu-percent=50 --min=1 --max=10` 3. Manually create `HorizontalPodAutoscaler` * Allows for more advanced configurations where the operator defaults are not appropriate
Just realized that these 2 options could require some attention, I mean both the operator and the scaler would act on the statefulSet.replicas
field.
Just realized that these 2 options could require some attention, I mean both the operator and the scaler would act on the
statefulSet.replicas
field.
My understanding is that implementing the scale subresource is all that's required so that HPA modifies the Infinispan spec.replicas
field.
https://github.com/infinispan/infinispan-operator/issues/691 deprecates the Cache Service in favour of providing the DataGrid service as the default and removing this configuration option.
Currently only the cache-service provides memory-based autoscaling, however it relies on assumptions about the cache's storage and replication type to determine when pods schould be scaled. This approach is not possible with the
DataGrid
service as users are able to use arbritrary cache configurations. Instead, we should introduce "container level" autoscaling where the number of pods increases and decreases based upon the memory usage of the entire container exceeding the configured upper or lower bound of memory usage percentage respectively.