ryanemerson commented 3 years ago

https://github.com/infinispan/infinispan-operator/issues/691 deprecates the Cache Service in favour of providing the DataGrid service as the default and removing this configuration option.

Currently only the cache-service provides memory-based autoscaling, however it relies on assumptions about the cache's storage and replication type to determine when pods schould be scaled. This approach is not possible with the DataGrid service as users are able to use arbritrary cache configurations. Instead, we should introduce "container level" autoscaling where the number of pods increases and decreases based upon the memory usage of the entire container exceeding the configured upper or lower bound of memory usage percentage respectively.

dmvolod commented 3 years ago

Can we use HPA for DataGrid case? https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

ryanemerson commented 3 years ago

It appears so :slightly_smiling_face:

https://docs.openshift.com/dedicated/4/nodes/pods/nodes-pods-autoscaling.html#nodes-pods-autoscaling-creating-memory_nodes-pods-autoscaling

rigazilla commented 3 years ago

Last time I tried to use the default HPA memory metrics I didn't get good results, mainly because of the java memory management. Maybe we can tune GC or investigate if HPA can be integrated with custom memory metrics.

dmvolod commented 3 years ago

Yeah, HPA custom metrics should support, but need to be validated https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md

rigazilla commented 3 years ago

A good starting point to design HPA/operator integration could be CPU autoscaling (#274): standard CPU load metric should work quite well to control pods scaling.

ryanemerson commented 2 years ago

There's also Kubernetes Event-driven Autoscaling that has integrations with PostgreSQL and Redis.

ryanemerson commented 2 years ago

A problem with a generic container wide approach is that it does not take into account the requirements of different cache types. Replicated and Distributed caches have very different requirements when it comes to autoscaling, therefore it's necessary for any autoscaling to be configured based upon the use-cases of the Infinispan cluster.

Cache Scaling Semantics

Here we define how different cache types affect scaling.

Replicated Cache

	Vertical Scaling	Horizontal Scaling
CPU	Allows increased read and write performance	Allows increased read performance, but results in slower writes as each additional pod needs to be included in every write operation
Memory	Increases capacity for all pods	Doesn't make sense as all pods store all entries, so increasing the number of pods does not increase the total memory available

Distributed Cache

	Vertical Scaling	Horizontal Scaling
CPU	Allows increased read and write performance	Does not improve CPU performance as entries are always read from entry primary or backup owners
Memory	Increases memory capacity of the cluster	Increases memory capacity of the cluster

Proposal

Implement automatic Horizontal Scaling and require users/admins to manually perform Vertical scaling by updating the Infinispan spec.container fields.

Automatically scaling an existing cluster vertically is tricky as it can lead to the cluster becoming unavailable due to a lack of resources. Furthermore, K8s does not provide a mechanism to vertically scale OOB.

Correct autoscaling behaviour is tightly coupled to an application's Infinispan requirements and cannot be implemented in a way that is applicable to all users. This proposal is concerned with how we can expose autoscaling configuration to the user so that they can define behaviour suitable for their use-case. A big part of this effort will be creating documentation that details what type of scaling is appropriate for different workloads.

Implementation

Based upon the HorizontalPodAutoscaler.

We extend the Infinispan CRD to define the scale subresource.

The HorizontalPodAutoscaler controller will then increase/decrease the Infinispan CR spec.replicas field based upon the behaviour defined in the HorizontalPodAutoscaler CR.

Utilising the autoscaling/v2beta2 api can define fine grained control of the scale up/down behaviour. For example, utilising a stabilizationWindowSeconds to prevent excessive scaling resulting in rebalancing adversely affecting performance.

Below is an example HorizontalPodAutoscaler definition with a custom scaleUp definition.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-infinispan
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: v1 
    kind: Infinispan 
    name: example-infinispan
  minReplicas: 1 
  maxReplicas: 10 
  metrics: 
  - type: Resource
    resource:
      name: memory 
      target:
        type: AverageValue 
        averageValue: 500Mi
  behavior: 
    scaleUp:
      stabilizationWindowSeconds: 180
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120
      selectPolicy: Max

User Configuration

The user can define scaling in one of three ways:

Infinispan CR spec
- User defines the high-level scaling behaviour required
- Operator automatically creates the HorizontalPodAutoscaler
- Requires a new Infinispan CR version

spec:
  autoscale:
    minReplicas: 1
    maxReplicas: 10
    resource:
      - name: cpu
        type: AverageValue | Utilization | Value
        # One of the below fields must be defined depending on the configure type
        averageValue: 500Mi
        value: 500Mi
        averageUtilisation: 50%
      - name: memory
        type: AverageValue | Utilization | Value
        # One of the below fields must be defined depending on the configure type
        averageValue: 500m
        value: 500m
        averageUtilisation: 50%

kubectl
- kubectl autoscale infinispan example-infinispan --cpu-percent=50 --min=1 --max=10
Manually create HorizontalPodAutoscaler
- Allows for more advanced configurations where the operator defaults are not appropriate

rigazilla commented 2 years ago

@ryanemerson, overall I like the approach. I've still some concerns about the metrics though: while I consider the default CPU metric good enough, for memory I would suggest as a first step to verify if my previous comment is still true. I mean, without a good metric it's hard to control a system. I that case we could try to tune the GC, as described here or a more complex solution could be to instrument Infinispan with an ad-hoc metric.

ryanemerson commented 2 years ago

for memory I would suggest as a first step to verify if my previous comment is still true. I mean, without a good metric it's hard to control a system.

Can you elaborate on the issues you encountered?

I'm guessing it was the JVM not releasing committed memory once it's unused?

I that case we could try to tune the GC, as described here or a more complex solution could be to instrument Infinispan with an ad-hoc metric.

I think this is an area where we would benefit from using Shenandoah

https://stackoverflow.com/questions/61506136/kubernetes-pod-memory-java-gc-logs/61512521#61512521

rigazilla commented 2 years ago

Can you elaborate on the issues you encountered?

I'm guessing it was the JVM not releasing committed memory once it's unused?

Yep that is probably the main one
iirc: another problematic aspect is how the pods are scaled up: kubernetes doesn't start new pods one by one, instead it applies a multiplier factor https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details. This doesn't fit very well with java applications with consistent initial memory foot print. (btw I consider very aggressive this "multiplicative algorithm", maybe I'm missing something)
(minor) there's a "minimum number of nodes" in the ispn metrics below which the cluster starts to loose data. I'm not sure this can be handled via standard autoscaler

ryanemerson commented 2 years ago

2. iirc: another problematic aspect is how the pods are scaled up: kubernetes doesn't start new pods one by one, instead it applies a multiplier factor https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details. This doesn't fit very well with java applications with consistent initial memory foot print. (btw I consider very aggressive this "multiplicative algorithm", maybe I'm missing something)

We can control this with the autoscaling/v2beta2 API, as it lets us control the scale up/down behaviour.

3. (minor) there's a "minimum number of nodes" in the ispn metrics below which the cluster starts to loose data. I'm not sure this can be handled via standard autoscaler

We could make the scale/up down behaviour be dictated by Infinispan itself using a custom metric that indicates when more/less memory is required, with the metric taking into account a lower bound to ensure that the cluster maintains at least the minimum number of pods.

Exposing a custom metric is more involved than using the basic memory usage and would require an enhancement on the server side. We could start with a basic memory based approach and then enhance the auto scale feature in the future as required.

Here is a quick guide on how to use custom metrics with HPA.

rigazilla commented 2 years ago

We could start with a basic memory based approach and then enhance the auto scale feature in the future as required.

Sounds good, though I would suggest to verify early how far can we go with basic metrics, imo the choice between basic vs ad-hoc metric could have broad impact (possibly even on features design?)

rigazilla commented 2 years ago

2. kubectl

   * `kubectl autoscale infinispan example-infinispan --cpu-percent=50 --min=1 --max=10`

3. Manually create `HorizontalPodAutoscaler`

   * Allows for more advanced configurations where the operator defaults are not appropriate

Just realized that these 2 options could require some attention, I mean both the operator and the scaler would act on the statefulSet.replicas field.

ryanemerson commented 2 years ago

Just realized that these 2 options could require some attention, I mean both the operator and the scaler would act on the statefulSet.replicas field.

My understanding is that implementing the scale subresource is all that's required so that HPA modifies the Infinispan spec.replicas field.

infinispan / infinispan-operator

Autoscaling #749

Cache Scaling Semantics

Replicated Cache

Distributed Cache

Proposal

Implementation

User Configuration