kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

vpa-recommender panicking in model.(*AggregateContainerState).MergeContainerState #1295

Closed mattnworb closed 5 years ago

mattnworb commented 6 years ago

At first glance this looks similar to #1258 but my panic and stacktrace looks different, also note the pod is up and running ok for some time before it panics:

I1002 22:24:32.480954       8 request.go:481] Throttling request took 191.220188ms, request: PATCH:https://10.178.96.1:443/apis/poc.autoscaling.k8s.io/v1alpha1/namespaces/creator-authorization/verticalpodautoscalers/authorization3dev
I1002 22:24:32.489580       8 recommender.go:66] VPA to update #{seti-test test-with-gke-repo}: &{ID:{Namespace:seti-test VpaName:test-with-gke-repo} PodSelector:app=test-with-gke-repo Conditions:map[RecommendationProvided:{Type:RecommendationProvided Status:True LastTransitionTime:2018-07-17 20:27:45 +0000 UTC Reason: Message:}] Recommendation:0xc42036e0a0 aggregateContainerStates:map[] ResourcePolicy:<nil> ContainersInitialAggregateState:map[test-with-gke-repo:0xc4200aff80]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xe05646]

goroutine 1 [running]:
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model.(*AggregateContainerState).MergeContainerState(0x0, 0xc4200aff80)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model/aggregate_container_state.go:71 +0x26
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model.(*Vpa).MergeCheckpointedState(0xc4203fd7a0, 0xc421d0f6e0)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model/vpa.go:110 +0xd1
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model.(*Vpa).AggregateStateByContainerName(0xc4203fd7a0, 0x1a35958)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/model/vpa.go:118 +0x47
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/logic.(*podResourceRecommender).GetRecommendedPodResources(0xc420605020, 0xc4203fd7a0, 0x16)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/logic/recommender.go:70 +0x40
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/initialization.(*recommender).updateVPAs(0xc4202807e0)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/initialization/recommender.go:67 +0x630
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/initialization.(*recommender).RunOnce(0xc4202807e0)
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/initialization/recommender.go:96 +0xcc
main.main()
        /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/main.go:57 +0x158

This is with the image k8s.gcr.io/vpa-recommender:0.2.0

bskiba commented 6 years ago

Yep, this is fixed in head by #1134. We're planning to release 0.3.0 sometime in the next two weeks.

wmuizelaar commented 5 years ago

@bskiba any update on the 0.3.0 release?

wmuizelaar commented 5 years ago

We're seeing a very similar issue on the updater. Is that the same issue, or seperate?

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0xe572b3]

goroutine 1 [running]:
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority.(*UpdatePriorityCalculator).getUpdatePriority(0xc420b1fb00, 0xc420b69898, 0xc4220bb0c0, 0xc420b69898, 0xc4220bb0c0, 0x0)
    /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go:121 +0x7b3
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority.(*UpdatePriorityCalculator).AddPod(0xc420b1fb00, 0xc420b69898, 0xc42031b3a0, 0xbeeedeeffa5d1c4b, 0xe07eb56b9, 0x17bb6e0)
    /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go:75 +0x1b3
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic.(*updater).getPodsForUpdate(0xc42039fec0, 0xc42030c780, 0x1, 0x1, 0xc42019ec40, 0xc42030c780, 0x1, 0x1)
    /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:123 +0x1d9
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic.(*updater).RunOnce(0xc42039fec0)
    /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:102 +0xa3f
main.main()
    /usr/local/google/home/bskiba/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/main.go:55 +0x16e
bskiba commented 5 years ago

Different issue, but fixed here on the master: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go#L148

Sorry for the delay on 0.3.0, I expect to be able to work on it next week.

bskiba commented 5 years ago

Update: I'm currently testing the new image, should be able to release around Tuesday next week.

fejta-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

bskiba commented 5 years ago

Both 0.3.0 and 0.3.1 are available and should be free of this issue. /close

k8s-ci-robot commented 5 years ago

@bskiba: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/1295#issuecomment-461450133): >Both 0.3.0 and 0.3.1 are available and should be free of this issue. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mattnworb commented 5 years ago

@bskiba thanks for the update