Open rkashasl opened 1 week ago
/area vertical-pod-autoscaler
vpa-autoscaler-updater pod can't adjust cert-manager deployment, there is an error:
VPA doesn't adjust Deployments, it adjusts the Pods that are managed controllers.
Can you print the details of the pod in the cert-manager
Deployment?
Here is a pod definition
Name: cert-manager-d69584748-ws6hx
Namespace: cert-manager
Priority: 65
Priority Class Name: above-average
Service Account: certmanager-oidc
Node: ip-10-9-125-93.eu-central-1.compute.internal/10.9.125.93
Start Time: Thu, 14 Nov 2024 18:34:40 +0200
Labels: app=cert-manager
app.kubernetes.io/component=controller
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cert-manager
app.kubernetes.io/version=v1.16.1
helm.sh/chart=cert-manager-v1.16.1
pod-template-hash=d69584748
topology.kubernetes.io/region=eu-central-1
topology.kubernetes.io/zone=eu-central-1b
Annotations: ref.kubemod.io/inject-node-ref: true
ref.kubemod.io/nodename: ip-10-9-125-93.eu-central-1.compute.internal
vpaObservedContainers: cert-manager-controller
vpaUpdates: Pod resources updated by cert-manager: container 0: cpu request, memory request, memory limit
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.9.91.47
IPs:
IP: 10.9.91.47
Controlled By: ReplicaSet/cert-manager-d69584748
Containers:
cert-manager-controller:
Container ID: containerd://8ee7e18f817ec0af6211ca3b9c9959a4bbe407334e63811687d59dc71021c89a
Image: quay.io/jetstack/cert-manager-controller:v1.16.1
Image ID: quay.io/jetstack/cert-manager-controller@sha256:ae5e14401cde4dec8bccce7594f829cd491044aa66944272e1d4fccc941ec77c
Ports: 9402/TCP, 9403/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--v=2
--cluster-resource-namespace=$(POD_NAMESPACE)
--leader-election-namespace=kube-system
--acme-http01-solver-image=quay.io/jetstack/cert-manager-acmesolver:v1.16.1
--default-issuer-name=letsencrypt-prod
--default-issuer-kind=ClusterIssuer
--max-concurrent-challenges=60
State: Running
Started: Thu, 14 Nov 2024 18:34:42 +0200
Ready: True
Restart Count: 0
Limits:
memory: 152507419
Requests:
cpu: 25m
memory: 63544758
Liveness: http-get http://:http-healthz/livez delay=10s timeout=15s period=10s #success=1 #failure=8
Environment:
POD_NAMESPACE: cert-manager (v1:metadata.namespace)
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: eu-central-1
AWS_REGION: eu-central-1
AWS_ROLE_ARN: arn:aws:iam::879742
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9zj59 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
kube-api-access-9zj59:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: arch=arm64:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Limits:
memory: 152507419
Requests:
cpu: 25m
memory: 63544758
Looks like the recommendations were applied to the Pod
but vpa-updater log still reporting an error
Do you have any other VPA objects on this cluster? Or do you have steps to reproduce this locally?
Sure we have more vpa in the cluster Here is another pod from cert-manager
Name: cert-manager-cainjector-79849c95bf-9kbtz
Namespace: cert-manager
Priority: 65
Priority Class Name: above-average
Service Account: cert-manager-cainjector
Node: ip-10-9-125-93.eu-central-1.compute.internal/10.9.125.93
Start Time: Fri, 15 Nov 2024 09:16:40 +0200
Labels: app=cainjector
app.kubernetes.io/component=cainjector
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cainjector
app.kubernetes.io/version=v1.16.1
helm.sh/chart=cert-manager-v1.16.1
pod-template-hash=79849c95bf
topology.kubernetes.io/region=eu-central-1
topology.kubernetes.io/zone=eu-central-1b
Annotations: ref.kubemod.io/inject-node-ref: true
ref.kubemod.io/nodename: ip-10-9-125-93.eu-central-1.compute.internal
vpaObservedContainers: cert-manager-cainjector
vpaUpdates: Pod resources updated by cert-manager-cainjector: container 0: cpu request, memory request, memory limit
Status: Running
SeccompProfile: RuntimeDefault
IP: 10.9.76.114
IPs:
IP: 10.9.76.114
Controlled By: ReplicaSet/cert-manager-cainjector-79849c95bf
Containers:
cert-manager-cainjector:
Container ID: containerd://1f04622e1c553cfde728b77f0d2bff0a7d64a6a4fe57c4b9c1f31edd1cbef8ad
Image: quay.io/jetstack/cert-manager-cainjector:v1.16.1
Image ID: quay.io/jetstack/cert-manager-cainjector@sha256:3c49185718cf454bac559f71c4453b33f1086db48084604247d9acb7a4de2973
Port: 9402/TCP
Host Port: 0/TCP
Args:
--v=2
--leader-election-namespace=kube-system
--leader-elect=false
State: Running
Started: Fri, 15 Nov 2024 09:16:43 +0200
Ready: True
Restart Count: 0
Limits:
memory: 507221956
Requests:
cpu: 12m
memory: 126805489
Environment:
POD_NAMESPACE: cert-manager (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8glw4 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-8glw4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: arch=arm64:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Here are vpa yamls:
Name: cert-manager
Namespace: cert-manager
Labels: kustomize.toolkit.fluxcd.io/name=cert-manager-release
kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations: <none>
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2023-10-18T14:14:12Z
Generation: 56939
Resource Version: 1855215477
UID: adf8e2f7-13b4-438b-9fd8-d560212f9bfd
Spec:
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: cert-manager
Update Policy:
Min Replicas: 1
Update Mode: Recreate
Status:
Conditions:
Last Transition Time: 2023-10-18T14:14:41Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: cert-manager-controller
Lower Bound:
Cpu: 25m
Memory: 49566183
Target:
Cpu: 25m
Memory: 63544758
Uncapped Target:
Cpu: 25m
Memory: 63544758
Upper Bound:
Cpu: 25m
Memory: 63706492
Events: <none>
Name: cert-manager-cainjector
Namespace: cert-manager
Labels: kustomize.toolkit.fluxcd.io/name=cert-manager-release
kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations: <none>
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2023-10-18T14:14:12Z
Generation: 56959
Resource Version: 1855219204
UID: e51cc83c-c36d-4ffe-9ae3-dbf4a7eec46c
Spec:
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: cert-manager-cainjector
Update Policy:
Min Replicas: 1
Update Mode: Recreate
Status:
Conditions:
Last Transition Time: 2023-10-18T14:14:41Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: cert-manager
Lower Bound:
Cpu: 12m
Memory: 272057382
Target:
Cpu: 12m
Memory: 297164212
Uncapped Target:
Cpu: 12m
Memory: 297164212
Upper Bound:
Cpu: 12m
Memory: 299223730
Container Name: cert-manager-cainjector
Lower Bound:
Cpu: 12m
Memory: 63544434
Target:
Cpu: 12m
Memory: 126805489
Uncapped Target:
Cpu: 12m
Memory: 126805489
Upper Bound:
Cpu: 12m
Memory: 225958081
Events: <none>
How many containers are defined in the cert-manager-cainjector
Deployment?
I've looked into this more, the only way I can reproduce it is as follows:
the VPA Updater now gives the same error you see, since the container doesn't exist for a recommendation that it has.
We have one container in each of cert-manager deployments(cainjector, webhook and default cert-manager) Here is cainjector deployment
Name: cert-manager-cainjector
Namespace: cert-manager
CreationTimestamp: Wed, 18 Oct 2023 17:14:32 +0300
Labels: app=cainjector
app.kubernetes.io/component=cainjector
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cainjector
app.kubernetes.io/version=v1.16.1
helm.sh/chart=cert-manager-v1.16.1
helm.toolkit.fluxcd.io/name=cert-manager
helm.toolkit.fluxcd.io/namespace=cert-manager
Annotations: deployment.kubernetes.io/revision: 9
meta.helm.sh/release-name: cert-manager
meta.helm.sh/release-namespace: cert-manager
Selector: app.kubernetes.io/component=cainjector,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cainjector
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=cainjector
app.kubernetes.io/component=cainjector
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cainjector
app.kubernetes.io/version=v1.16.1
helm.sh/chart=cert-manager-v1.16.1
Service Account: cert-manager-cainjector
Containers:
cert-manager-cainjector:
Image: quay.io/jetstack/cert-manager-cainjector:v1.16.1
Port: 9402/TCP
Host Port: 0/TCP
Args:
--v=2
--leader-election-namespace=kube-system
--leader-elect=false
Limits:
memory: 512Mi
Requests:
cpu: 10m
memory: 128Mi
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
Mounts: <none>
Volumes: <none>
Priority Class Name: above-average
Node-Selectors: kubernetes.io/os=linux
Tolerations: arch=arm64:NoSchedule
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: cert-manager-cainjector-7c58fd79b9 (0/0 replicas created), cert-manager-cainjector-7f65854598 (0/0 replicas created), cert-manager-cainjector-7ff4b9f549 (0/0 replicas created), cert-manager-cainjector-7bdb99cb99 (0/0 replicas created), cert-manager-cainjector-85bfd86474 (0/0 replicas created), cert-manager-cainjector-774864555c (0/0 replicas created), cert-manager-cainjector-798ffdbcf6 (0/0 replicas created), cert-manager-cainjector-5fb8dbc786 (0/0 replicas created)
NewReplicaSet: cert-manager-cainjector-79849c95bf (1/1 replicas created)
Events: <none>
The error message you see is because the VPA object has 2 recommendations for that Deployment, one for "cert-manager" and one for "cert-manager-cainjector".
I'm unsure how it got in this state, but the error message isn't harmful.
We should possibly figure out a way to garbage collect this recommendation or to suppress the error
Wait, there are 3 different deployments for cert-manager stuff and also 3 vpa for each of them
vpa-autoscaler-updater pod can't adjust cert-manager deployment, there is an error:
1113 12:33:40.225140 1 capping.go:79] no matching Container found for recommendation cert-manager
vpa helm chart: 9.9.0
vpa.cert-manager:
cert-manager deployment:
vpa HR:
related to https://github.com/kubernetes/autoscaler/issues/6215