prometheus-community / helm-charts

Prometheus community Helm charts
Apache License 2.0
4.98k stars 4.99k forks source link

[kube-prometheus-stack] VPA does not scale prometheus #4711

Open scott-grimes opened 2 months ago

scott-grimes commented 2 months ago

Describe the bug a clear and concise description of what the bug is.

The VerticalPodAutoscaler included in the kube-prometheus-stack chart only scales the operator.

See also #3095, #3097, https://github.com/prometheus-operator/prometheus-operator/issues/5594

# Enable vertical pod autoscaler support for prometheus-operator
  verticalPodAutoscaler:
    enabled: false

    # Recommender responsible for generating recommendation for the object.
    # List should be empty (then the default recommender will generate the recommendation)
    # or contain exactly one recommender.
    # recommenders:
    # - name: custom-recommender-performance

    # List of resources that the vertical pod autoscaler can control. Defaults to cpu and memory
    controlledResources: []
    # Specifies which resource values should be controlled: RequestsOnly or RequestsAndLimits.
    # controlledValues: RequestsAndLimits

    # Define the max allowed resources for the pod
    maxAllowed: {}
    # cpu: 200m
    # memory: 100Mi
    # Define the min allowed resources for the pod
    minAllowed: {}
    # cpu: 200m
    # memory: 100Mi

    updatePolicy:
      # Specifies minimal number of replicas which need to be alive for VPA Updater to attempt pod eviction
      # minReplicas: 1
      # Specifies whether recommended updates are applied when a Pod is started and whether recommended updates
      # are applied during the life of a Pod. Possible values are "Off", "Initial", "Recreate", and "Auto".
      updateMode: Auto

Note: It IS possible to scale prometheus by creating a second VPA object. This will work for prometheus when running in the default "unsharded" mode, or when sharding is enabled via prometheus.prometheusSpec.shards: 2. There are no conflicts between the prometheus-operator and the statefulset controller.

The VPA can scale not only the built-in resources like Deployment or StatefulSet, but also Custom Resources which manage Pods. Just like the Horizontal Pod Autoscaler, the VPA requires that the Custom Resource implements the /scale subresource with the optional field labelSelector, which corresponds to .scale.status.selector. VPA doesn't use the /scale subresource for the actual scaling, but uses this label selector to identify the Pods managed by a Custom Resource. As VPA relies on Pod eviction to apply new resource recommendations, this ensures that all Pods with a matching VPA object are managed by a controller that will recreate them after eviction.

A working example of this:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-outofband-vpa
  namespace: kube-prometheus-stack
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: prometheus
       maxAllowed: 
         cpu: 4
         memory: 16Gi
    - containerName: "*" # match all other containers
       mode: "Off"
  targetRef:
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    name: kube-prometheus-stack-prometheus # Name of your Prometheus object
  updatePolicy:
    updateMode: Auto
    minReplicas: 1 # required if you do not run multiple replicas

Testing and working with VPA version 1.1.2. The second container policy with mode: "Off" is required if you wish to exclude non-prometheus containers in matching pods (e.g. sidecar pods like config-reloader).

To avoid confusion this new VPA object could be instantiated elsewhere in the chart, with a comment that makes it clear which section stands up which VPA. Alternatively, we could ditch the prometheus-operator VPA in favor of one that scales prometheus, since that is what most people are looking for anyways.

What's your helm version?

3.15.3

What's your kubectl version?

1.30.0

Which chart?

kube-prometheus-stack

What's the chart version?

61.3.0

What happened?

enabling (the only verticalPodAutoscaler) in the chart will NOT scale your prometheus instances, it only scales the operator pod

prometheusOperator:
  verticalPodAutoscaler:
    enabled: true

What you expected to happen?

I suspect +99% of people using this chart are interested in vertical pod autoscaling for prometheus, the operator pod itself is extremely light weight compared to prometheus. It's a bit misleading for users who are not familiar with how the operator (or VPA works) who may think this will help them scale their prometheus instance. See also #3095, $3097, https://github.com/prometheus-operator/prometheus-operator/issues/5594

How to reproduce it?

Set the following:

prometheusOperator:
  verticalPodAutoscaler:
    enabled: true

View the rendered manifests

# templates/prometheus-operator/verticalpodautoscaler.yaml
...
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: {{ template "kube-prometheus-stack.name" . }}
  ...
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "kube-prometheus-stack.operator.fullname" . }}

Observe VPA selector only targets the operator pod.

Enter the changed values of values.yaml?

prometheusOperator:
  verticalPodAutoscaler:
    enabled: true

Enter the command that you execute and failing/misfunctioning.

helm install kube-prometheus-stack kube-prometheus-stack --values values.yaml

Anything else we need to know?

Happy to write the PR to get this done and accompanying documention if there's consensus on what the path forward here is

calebAtIspot commented 1 month ago

is it possible to scale prometheus with a HPA instead?