kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

When using examples/helmster.yaml for testing, the updater component experienced a panic, with the panic message being runtime error: invalid memory address or nil pointer dereference #6808

Closed itonyli closed 5 months ago

itonyli commented 5 months ago

Which component are you using?: Vertical Pod Autoscaler

What version of the component are you using?: 1.1.1

Component version:

What k8s version are you using (kubectl version)?: Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.1", GitCommit:"3ddd0f45aa91e2f30c70734b175631bec5b5825a", GitTreeState:"clean", BuildDate:"2022-05-24T12:17:11Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.2", GitCommit:"4b8e819355d791d96b7e9d9efe4cbafae2311c88", GitTreeState:"clean", BuildDate:"2024-02-14T22:24:00Z", GoVersion:"go1.21.7", Compiler:"gc", Platform:"linux/amd64"}

kubectl version Output
$ kubectl version

What environment is this in?: local(kind created)

What did you expect to happen?: The updater component scales vertically based on the suggestions generated by the recommender

What happened instead?: updater panic, E0509 03:44:19.458556 1 api.go:153] fail to get pod controller: pod=etcd-ha-control-plane3 err=Unhandled targetRef v1 / Node / ha-control-plane3, last error node is not a valid owner E0509 03:44:19.458587 1 api.go:153] fail to get pod controller: pod=etcd-ha-control-plane err=Unhandled targetRef v1 / Node / ha-control-plane, last error node is not a valid owner E0509 03:44:19.458864 1 api.go:153] fail to get pod controller: pod=kube-apiserver-ha-control-plane2 err=Unhandled targetRef v1 / Node / ha-control-plane2, last error node is not a valid owner panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x159129f]

goroutine 1 [running]: k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority.(scalingDirectionPodEvictionAdmission).LoopInit(0xc000566538, {0x1a1dda3?, 0xa?, 0x40b?}, 0xc0002ad5c0) /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority/scaling_direction_pod_eviction_admission.go:111 +0x11f k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic.(updater).RunOnce(0xc0003342c0, {0x1c97290, 0xc0001741c0}) /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/logic/updater.go:183 +0xb44 main.main() /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/main.go:127 +0x7ef

How to reproduce it (as minimally and precisely as possible): ./hack/vpa-up.sh kubectl create -f examples/hamster.yaml

# This config creates a deployment with two pods, each requesting 100 millicores
# and trying to utilize slightly above 500 millicores (repeatedly using CPU for
# 0.5s and sleeping 0.5s).
# It also creates a corresponding Vertical Pod Autoscaler that adjusts the
# requests.
# Note that the update mode is left unset, so it defaults to "Auto" mode.
---
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: hamster-vpa
spec:
  # recommenders field can be unset when using the default recommender.
  # When using an alternative recommender, the alternative recommender's name
  # can be specified as the following in a list.
  # recommenders: 
  #   - name: 'alternative'
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: hamster
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 50Mi
        maxAllowed:
          cpu: 1
          memory: 500Mi
        controlledResources: ["cpu", "memory"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hamster
spec:
  selector:
    matchLabels:
      app: hamster
  replicas: 2
  template:
    metadata:
      labels:
        app: hamster
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534 # nobody
      containers:
        - name: hamster
          image: registry.k8s.io/ubuntu-slim:0.1
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
          command: ["/bin/sh"]
          args:
            - "-c"
            - "while true; do timeout 0.5s yes >/dev/null; sleep 0.5s; done"

Anything else we need to know?:

adrianmoisey commented 5 months ago

/area vertical-pod-autoscaler

adrianmoisey commented 5 months ago

It seems as though the mutatingwebhookconfigurations isn't configured in your setup. It's responsible for filling in spec.UpdatePolicy . See line 93: https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.1.1/vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go#L74-L102

itonyli commented 5 months ago

It seems as though the mutatingwebhookconfigurations isn't configured in your setup. It's responsible for filling in spec.UpdatePolicy . See line 93: https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.1.1/vertical-pod-autoscaler/pkg/admission-controller/resource/vpa/handler.go#L74-L102

Yes, but in the API declaration, spec UpdatePolicy is an optional configuration and defaults to auto. I think if it is an optional configuration, then I can choose not to configure it. According to the current logic, if it is not configured, it cannot run.

adrianmoisey commented 5 months ago

Right, that is fair. The change in https://github.com/kubernetes/autoscaler/pull/6809 also ensures that the VPA continues to use the default policy, which is in line with the API spec

itonyli commented 5 months ago

@adrianmoisey Can you help me review the code? I don't see any response from PR

adrianmoisey commented 5 months ago

Unfortunately I'm not a reviewer, so I can't approve it.

voelzmo commented 5 months ago

Version 1.1.2 has been released including a fix to this issue. Thanks everyone!

lefterisALEX commented 2 months ago

we see same issue in 1.2.5

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x161e6d5]

goroutine 1 [running]:
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input.(*clusterStateFeeder).setVpaCheckpoint(0xc0069e0fe0?, 0xc005a9cb40)
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:236 +0x1d5
k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input.(*clusterStateFeeder).InitFromCheckpoints(0xc0002264d0)
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/input/cluster_feeder.go:266 +0x6bc
main.run(0xc00014fce0)
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/main.go:269 +0xf38
main.main()
    /gopath/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/recommender/main.go:136 +0x46c