kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

VPA updater constantly fails to match the container that doesn't even exists #6215

Closed rkashasl closed 4 months ago

rkashasl commented 1 year ago

Hello! We are using latest vpa chart:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: vertical-pod-autoscaler
  namespace: kube-system
spec:
  interval: 14m
  url: "https://cowboysysop.github.io/charts/"
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: vertical-pod-autoscaler
  namespace: kube-system
spec:
  install:
    createNamespace: true
    crds: CreateReplace
  upgrade:
    crds: CreateReplace
  releaseName: vertical-pod-autoscaler
  interval: 9m
  chart:
    spec:
      # renovate: registryUrl=https://cowboysysop.github.io/charts/
      chart: vertical-pod-autoscaler
      version: 7.2.0
      sourceRef:
        kind: HelmRepository
        name: vertical-pod-autoscaler
        namespace: kube-system
      interval: 14m
  values:
    admissionController:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 50Mi
        requests:
          cpu: 10m
          memory: 40Mi
    recommender:
      extraArgs:
        pod-recommendation-min-memory-mb: 30
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 250Mi
        requests:
          cpu: 10m
          memory: 150Mi
    updater:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 50Mi
        requests:
          cpu: 10m
          memory: 50Mi

However we see an errors about cert-manager container that vpa-updater pod spamming:

vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:14:53.194314       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:14:53.194621       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:15:53.202479       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:15:53.232326       1 capping.go:79] no matching Container found for recommendation cert-manager
vertical-pod-autoscaler-updater-7747d6547-qbt96 I1020 10:16:53.193146       1 capping.go:79] no matching Container found for recommendation cert-manager

Here is a cer-manager deployment and it's vpa:

---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  interval: 14m
  url: "https://charts.jetstack.io/"
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  install:
    createNamespace: true
    crds: CreateReplace
  upgrade:
    crds: CreateReplace
  interval: 9m
  chart:
    spec:
      # renovate: registryUrl=https://charts.jetstack.io/
      chart: cert-manager
      version: v1.13.1
      sourceRef:
        kind: HelmRepository
        name: cert-manager
        namespace: cert-manager
      interval: 14m
  values:
    installCRDs: true
    serviceAccount:
      create: false
      name: certmanager-oidc
    global:
      priorityClassName: above-average
    prometheus:
      enabled: true
      servicemonitor:
        enabled: true
        prometheusInstance: prometheus-kube-prometheus-prometheus
    ingressShim:
      defaultIssuerName: letsencrypt-prod
      defaultIssuerKind: ClusterIssuer
    webhook:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      resources:
        limits:
          memory: 64Mi
        requests:
          memory: 32Mi
          cpu: 10m
    cainjector:
      tolerations:
      - key: "arch"
        operator: "Equal"
        value: "arm64"
        effect: "NoSchedule"
      extraArgs:
      - "--leader-elect=false"
      resources:
        limits:
          memory: 512Mi
        requests:
          memory: 128Mi
          cpu: 10m
    resources:
      limits:
        memory: 384Mi
      requests:
        memory: 160Mi
        cpu: 10m
    tolerations:
    - key: "arch"
      operator: "Equal"
      value: "arm64"
      effect: "NoSchedule"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager-cainjector
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager-cainjector
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: cert-manager-webhook
  namespace: cert-manager
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cert-manager-webhook
  updatePolicy:
    updateMode: Recreate
    minReplicas: 1

And there is no cert-manager container name it refers to in deployment:

Name:                   cert-manager
Namespace:              cert-manager
CreationTimestamp:      Wed, 18 Oct 2023 17:14:32 +0300
Labels:                 app=cert-manager
                        app.kubernetes.io/component=controller
                        app.kubernetes.io/instance=cert-manager
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=cert-manager
                        app.kubernetes.io/version=v1.13.1
                        helm.sh/chart=cert-manager-v1.13.1
                        helm.toolkit.fluxcd.io/name=cert-manager
                        helm.toolkit.fluxcd.io/namespace=cert-manager
Annotations:            deployment.kubernetes.io/revision: 1
                        meta.helm.sh/release-name: cert-manager
                        meta.helm.sh/release-namespace: cert-manager
Selector:               app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=cert-manager
                    app.kubernetes.io/component=controller
                    app.kubernetes.io/instance=cert-manager
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=cert-manager
                    app.kubernetes.io/version=v1.13.1
                    helm.sh/chart=cert-manager-v1.13.1
  Service Account:  certmanager-oidc
  Containers:
   cert-manager-controller:
    Image:       quay.io/jetstack/cert-manager-controller:v1.13.1
    Ports:       9402/TCP, 9403/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --v=2
      --cluster-resource-namespace=$(POD_NAMESPACE)
      --leader-election-namespace=kube-system
      --acme-http01-solver-image=quay.io/jetstack/cert-manager-acmesolver:v1.13.1
      --default-issuer-name=letsencrypt-prod
      --default-issuer-kind=ClusterIssuer
      --max-concurrent-challenges=60
    Limits:
      memory:  384Mi
    Requests:
      cpu:     10m
      memory:  160Mi
    Environment:
      POD_NAMESPACE:     (v1:metadata.namespace)
    Mounts:             <none>
  Volumes:              <none>
  Priority Class Name:  above-average
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cert-manager-7d47d666f8 (1/1 replicas created)
Events:          <none>
universam1 commented 12 months ago

Same issue here. It looks like that VPA recommender actually is broken, no recommendations are applied to new VPAs

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

rkashasl commented 7 months ago

/remove-lifecycle rotten

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

voelzmo commented 4 months ago

Without additional information, I assume that this is another instance of https://github.com/kubernetes/autoscaler/issues/6744 which could be fixed with https://github.com/kubernetes/autoscaler/pull/6745

TL;DR: stale recommendations that don't have a matching Pod anymore can exist, when you e.g. renamed a Container in a Pod.

I'm closing this in favor of the above mentioned issue. Feel free to re-open with additional information.

/close

k8s-ci-robot commented 4 months ago

@voelzmo: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/6215#issuecomment-2145367081): >Without additional information, I assume that this is another instance of https://github.com/kubernetes/autoscaler/issues/6744 which could be fixed with https://github.com/kubernetes/autoscaler/pull/6745 > >TL;DR: stale recommendations that don't have a matching Pod anymore can exist, when you e.g. renamed a Container in a Pod. > >I'm closing this in favor of the above mentioned issue. Feel free to re-open with additional information. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.