kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.08k stars 3.97k forks source link

VPA Updater 1.1.0 panic: runtime error: invalid memory address or nil pointer dereference #6763

Closed rsavage-nozominetworks closed 6 months ago

rsavage-nozominetworks commented 6 months ago

Which component are you using?: vertical pod autoscaler / updater

What version of the component are you using?: 1.1.0

Component version: 1.1.0

What k8s version are you using (kubectl version)?: 1.29

kubectl version Output
Client Version: v1.29.1
Server Version: v1.29.1-eks-b9c9ed7

What environment is this in?:

What did you expect to happen?: to not panic

What happened instead?:

How to reproduce it (as minimally and precisely as possible): always occurs in larger clusters

Anything else we need to know?: This only appears to occur in our larger clusters with more than 50 nodes. I've tried moving the service around and the error still persists.

rsavage-nozominetworks commented 6 months ago

Error:

W0424 15:15:13.211219       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211235       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211240       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211243       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211249       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211259       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
W0424 15:15:13.211289       1 shared_informer.go:459] The sharedIndexInformer has started, run more than once is not allowed
I0424 15:15:13.312080       1 updater.go:246] Rate limit disabled
I0424 15:15:13.914908       1 api.go:94] Initial VPA synced successfully
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x10b4618]

goroutine 1 [running]:
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa.GetControllingVPAForPod(0x4001a83b08, {0x4002263f00, 0x6, 0x6372756f7365723a?}, {0x17a9160, 0x40000a8a00})
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/utils/vpa/api.go:164 +0x318
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic.(*updater).RunOnce(0x40000dfa20, {0x17c4950, 0x4002000a10})
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:175 +0x884
main.main()
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/main.go:127 +0x6bc
rsavage-nozominetworks commented 6 months ago

In the meanwhile, I've downgraded back to 1.0.0, and no more issues. There's something going on with 1.1.0 under load.

DeanDonkov commented 6 months ago

Observing the same error on my relatively small cluster as well.

voelzmo commented 6 months ago

Hey, thanks for reporting this! This has already been fixed on master and the vpa-release-1.1 branch. We have released vpa version 1.1.1 containing the fix.

/duplicates #6709

voelzmo commented 6 months ago

/close /triage duplicate

k8s-ci-robot commented 6 months ago

@voelzmo: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/6763#issuecomment-2079938164): >/close >/triage duplicate Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.