kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
109.83k stars 39.32k forks source link

[FG:InPlacePodVerticalScaling] Infeasible resize is actuated if requested before container is started #126527

Open hshiina opened 1 month ago

hshiina commented 1 month ago

What happened?

If an unacceptable pod resizing that causes Deferred or Infeasible is requested before the container is started (for example, while an init container is running), the container is started with the unacceptable spec.

$ kubectl create -f pod.yaml; sleep 5; kubectl patch pod resize-pod --patch '{"spec": {"containers": [{"name": "resize-container", "resources":{"requests": {"cpu": "100"}, "limits": {"cpu": "100"}}}]}}'
pod/resize-pod created
pod/resize-pod patched
$ kubectl get pod resize-pod -o jsonpath='spec: {.spec.containers[0].resources}{"\nallocatedResources: "}{.status.containerStatuses[0].allocatedResources}{"\nstatus: "}{.status.containerStatuses[0].resources}{"\nresize: "}{.status.resize}{"\n"}'
spec: {"limits":{"cpu":"100","memory":"200Mi"},"requests":{"cpu":"100","memory":"200Mi"}}
allocatedResources: {"cpu":"200m","memory":"200Mi"}
status: {"limits":{"cpu":"100","memory":"200Mi"},"requests":{"cpu":"100","memory":"200Mi"}}
resize: Infeasible

The pod is admitted with the initial spec when the pod is created. Then, the resized spec is not verified for admission because the pod is not running yet: https://github.com/kubernetes/kubernetes/blob/dbc2b0a5c7acc349ea71a14e49913661eaf708d2/pkg/kubelet/kubelet.go#L2811-L2814 As a result, the container is started with the unacceptable spec. Eventually, the pod gets into Infeasible resize status after the pod is started because the allocated resources that are not updated differs from the resized pod spec.

What did you expect to happen?

The pod is started with the initial spec and gets into Infeasible resize status or the pod fails to start.

How can we reproduce it (as minimally and precisely as possible)?

  1. Enable InPlacePodVerticalScaling.
  2. Create a pod with an init container that takes a few seconds to complete:

    ``` apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: run: resize-pod name: resize-pod spec: initContainers: - image: busybox name: init-container command: - sleep - "10" resources: requests: cpu: 100m memory: 100Mi limits: cpu: 100m memory: 100Mi containers: - image: busybox name: resize-container command: - sh - -c - trap "exit 0" SIGTERM; while true; do sleep 1; done resources: requests: cpu: 200m memory: 200Mi limits: cpu: 200m memory: 200Mi resizePolicy: - resourceName: cpu restartPolicy: NotRequired - resourceName: memory restartPolicy: NotRequired restartPolicy: Always ```
  3. While the init container is running, patch the pod with an infeasible resize request:
    $ kubectl create -f pod.yaml; sleep 5; kubectl patch pod resize-pod --patch '{"spec": {"containers": [{"name": "resize-container", "resources":{"requests": {"cpu": "100"}, "limits": {"cpu": "100"}}}]}}'
  4. Watch the pod:
    $ kubectl get pod resize-pod -o jsonpath='spec: {.spec.containers[0].resources}{"\nallocatedResources: "}{.status.containerStatuses[0].allocatedResources}{"\nstatus: "}{.status.containerStatuses[0].resources}{"\nresize: "}{.status.resize}{"\n"}' -w

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version # paste output here Client Version: v1.30.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.2 ```

Cloud provider

N/A

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

hshiina commented 1 month ago

/sig node

hshiina commented 1 month ago

/assign

kannon92 commented 1 month ago

@tallclair could you help triage this?

kannon92 commented 3 weeks ago

cc @esotsal

Please take a look and triage if you can.

esotsal commented 2 weeks ago

/triage accepted