understanding Vertical Pod Autoscaler recommendations

aogier commented 2 years ago

Hello, I'm currently evaluating VPA so I installed version 0.10.0 and I'm using recommender and updater components on an EKS 1.21 w/ metrics server 0.6.1 previously installed.

What I've tried to do is to configure a non-updating VPA against a random deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vpa-argocd-repo-server
  labels:
    helm.sh/chart: vpa-0.1.0
    app.kubernetes.io/name: vpa
    app.kubernetes.io/instance: vpa
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       argocd-repo-server
  updatePolicy:
    updateMode: "Off"

recommender seems ok with it, it logs activities and no complaints:

I0506 09:11:17.999359       1 recommender.go:184] Recommender Run
I0506 09:11:17.999413       1 cluster_feeder.go:349] Start selecting the vpaCRDs.
I0506 09:11:17.999421       1 cluster_feeder.go:374] Fetched 2 VPAs.
I0506 09:11:17.999462       1 cluster_feeder.go:384] Using selector app.kubernetes.io/name=argocd-applicationset-controller for VPA argocd/vpa-argocd-applicationset-controller
I0506 09:11:17.999507       1 cluster_feeder.go:384] Using selector app.kubernetes.io/name=argocd-repo-server for VPA argocd/vpa-argocd-repo-server
I0506 09:11:18.020271       1 metrics_client.go:73] 73 podMetrics retrieved for all namespaces
I0506 09:11:18.020792       1 cluster_feeder.go:460] ClusterSpec fed with #162 ContainerUsageSamples for #81 containers. Dropped #0 samples.
I0506 09:11:18.020898       1 recommender.go:194] ClusterState is tracking 75 PodStates and 2 VPAs
I0506 09:11:18.042550       1 checkpoint_writer.go:114] Saved VPA argocd/vpa-argocd-applicationset-controller checkpoint for argocd-applicationset-controller
I0506 09:11:18.053112       1 checkpoint_writer.go:114] Saved VPA argocd/vpa-argocd-repo-server checkpoint for argocd-repo-server
I0506 09:11:18.053226       1 recommender.go:204] ClusterState is tracking 61 aggregated container states

However I'm not sure about the resulting recommended MEM:

$ k -n argocd top po -l app.kubernetes.io/name=argocd-repo-server
NAME                                  CPU(cores)   MEMORY(bytes)
argocd-repo-server-6fcc65b49f-6xr7m   2m           81Mi
argocd-repo-server-6fcc65b49f-nm484   2m           122Mi

$ k -n argocd get vpa vpa-argocd-repo-server
NAME                     MODE   CPU   MEM          PROVIDED   AGE
vpa-argocd-repo-server   Off    15m   1644423393   True       22h

$ k -n argocd describe vpa vpa-argocd-repo-server
[...]
  Recommendation:
    Container Recommendations:
      Container Name:  argocd-repo-server
      Lower Bound:
        Cpu:     15m
        Memory:  1467028455
      Target:
        Cpu:     15m
        Memory:  1644423393
      Uncapped Target:
        Cpu:     15m
        Memory:  1644423393
      Upper Bound:
        Cpu:     47m
        Memory:  3397068568

given the recommender has been launched with this params:

  containers:
  - args:
    - --pod-recommendation-min-cpu-millicores=15
    - --pod-recommendation-min-memory-mb=100
    - --v=4

If I understand VPA fields correctly, I'd expect to at least read 100M under lower bound, and something smaller than 1.6G in target, given that this pod never ever reached such a big memory usage. What I'm missing? Many thanks in advance, regards

aleksrosz commented 2 years ago

If I understand well from source code. Lower bound and target is not just simple value from "args".

https://github.com/kubernetes/autoscaler/blob/83ad488398507b3c364a625e37957a07a2e56caf/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go

podMinMemoryMb       = flag.Float64("pod-recommendation-min-memory-mb", 250, `Minimum memory recommendation for a pod`) //your value from args

func (r *podResourceRecommender) GetRecommendedPodResources(containerNameToAggregateStateMap model.ContainerNameToAggregateStateMap) RecommendedPodResources {
    var recommendation = make(RecommendedPodResources)
    if len(containerNameToAggregateStateMap) == 0 { //if there is no previous state of container then return only what you gived in args or defaults else go to "fraction" variable
        return recommendation
    }

    fraction := 1.0 / float64(len(containerNameToAggregateStateMap))
    minResources := model.Resources{
        model.ResourceCPU:    model.ScaleResource(model.CPUAmountFromCores(*podMinCPUMillicores*0.001), fraction),
        model.ResourceMemory: model.ScaleResource(model.MemoryAmountFromBytes(*podMinMemoryMb*1024*1024), fraction), //this line is important. 
You can  see than there is used only fraction of what you gived in args but I don't understand how does it work
    }

    recommender := &podResourceRecommender{
        WithMinResources(minResources, r.targetEstimator),
        WithMinResources(minResources, r.lowerBoundEstimator),
        WithMinResources(minResources, r.upperBoundEstimator),
    }

    for containerName, aggregatedContainerState := range containerNameToAggregateStateMap {
        recommendation[containerName] = recommender.estimateContainerResources(aggregatedContainerState)
    }
    return recommendation
}

aogier commented 2 years ago

given that tools like goldilocks seems to take those values verbatim, what's the correct way of interpreting them, given that the observed pods never reached such usage?

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes / autoscaler

understanding Vertical Pod Autoscaler recommendations #4862