kubernetes / website

Kubernetes website and documentation repo:
https://kubernetes.io
Creative Commons Attribution 4.0 International
4.47k stars 14.4k forks source link

[FG:InPlacePodVerticalScaling] Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers” #41365

Open THMAIL opened 1 year ago

THMAIL commented 1 year ago

my k8s version:1.27.2

kubectl get nodes NAME STATUS ROLES AGE VERSION 172.30.94.14 Ready 7d v1.27.2 172.30.94.201 Ready 7d v1.27.2 ecs6w3fxmxy5c.novalocal Ready control-plane 7d v1.27.2

Problem

I want to try in-place update and I do as the document describe. But when I execute the cmd kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}',it threw err:

The Pod "qos-demo-5" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{
        Volumes:        {{Name: "kube-api-access-p29n4", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
        InitContainers: nil,
        Containers: []core.Container{
                {
                        ... // 6 identical fields
                        EnvFrom: nil,
                        Env:     nil,
                        Resources: core.ResourceRequirements{
                                Limits: core.ResourceList{
-                                       s"cpu":    {i: resource.int64Amount{value: 700, scale: -3}, s: "700m", Format: "DecimalSI"},
+                                       s"cpu":    {i: resource.int64Amount{value: 800, scale: -3}, s: "800m", Format: "DecimalSI"},
                                        s"memory": {i: {...}, Format: "BinarySI"},
                                },
                                Requests: core.ResourceList{
-                                       s"cpu":    {i: resource.int64Amount{value: 700, scale: -3}, s: "700m", Format: "DecimalSI"},
+                                       s"cpu":    {i: resource.int64Amount{value: 800, scale: -3}, s: "800m", Format: "DecimalSI"},
                                        s"memory": {i: {...}, Format: "BinarySI"},
                                },
                                Claims: nil,
                        },
                        ResizePolicy: nil,
                        VolumeMounts: {{Name: "kube-api-access-p29n4", ReadOnly: true, MountPath: "/var/run/secrets/kubernetes.io/serviceaccount"}},
                        ... // 12 identical fields
                },
        },
        EphemeralContainers: nil,
        RestartPolicy:       "Always",
        ... // 28 identical fields
  }
niranjandarshann commented 1 year ago

/language en /sig docs ref link: https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/

niranjandarshann commented 1 year ago

/kind support

sftim commented 1 year ago

/retitle Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

/remove-kind support /kind bug

The prereqisited section of https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/ should state that your cluster must have InPlacePodVerticalScaling enabled on the control plane and on nodes; however, it does not. /triage accepted /priority backlog /sig node

Thank you for reporting this @THMAIL

THMAIL commented 1 year ago

/retitle Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

/remove-kind support /kind bug

The prereqisited section of https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/ should state that your cluster must have InPlacePodVerticalScaling enabled on the control plane and on nodes; however, it does not. /triage accepted /priority backlog /sig node

Thank you for reporting this @THMAIL

Thank you for your reply.And I have modified the file /etc/kubernetes/manifests/kube-apiserver.yaml, add InPlacePodVerticalScaling=true

But there's another problem:

  1. I execute the cmd kubectl -n qos-example patch pod qos-demo-5 --patch '{"spec":{"containers":[{"name":"qos-demo-ctr-5", "resources":{"requests":{"cpu":"800m"}, "limits":{"cpu":"800m"}}}]}}'
  2. the pod can't start ,run kubectl get pod qos-demo-5 --namespace=qos-example -o wide
    NAME         READY   STATUS             RESTARTS        AGE   IP                NODE            NOMINATED NODE   READINESS GATES
    qos-demo-5   0/1     CrashLoopBackOff   8 (5m11s ago)   17m   192.168.239.146   172.30.94.201   <none>           <none>
  3. run kubectl describe pod qos-demo-5 --namespace=qos-example,Event log:
    Events:
    Type     Reason     Age                    From               Message
    ----     ------     ----                   ----               -------
    Normal   Scheduled  4m32s                  default-scheduler  Successfully assigned qos-example/qos-demo-5 to 172.30.94.201
    Normal   Pulled     4m28s                  kubelet            Successfully pulled image "nginx" in 2.299559292s (2.299579947s including waiting)
    Normal   Started    4m28s                  kubelet            Started container qos-demo-ctr-5
    Normal   Killing    3m13s                  kubelet            Container qos-demo-ctr-5 definition changed, will be restarted
    Normal   Pulled     3m10s                  kubelet            Successfully pulled image "nginx" in 2.311044787s (2.311062277s including waiting)
    Normal   Pulled     3m7s                   kubelet            Successfully pulled image "nginx" in 2.167481718s (2.167497407s including waiting)
    Normal   Pulled     2m50s                  kubelet            Successfully pulled image "nginx" in 2.217118706s (2.217147034s including waiting)
    Normal   Pulling    2m21s (x5 over 4m30s)  kubelet            Pulling image "nginx"
    Normal   Created    2m19s (x5 over 4m28s)  kubelet            Created container qos-demo-ctr-5
    Warning  Failed     2m19s (x4 over 3m10s)  kubelet            Error: failed to start container "qos-demo-ctr-5": Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: failed to write "80000": write /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/qos-demo-ctr-5/cpu.cfs_quota_us: invalid argument: unknown
    Normal   Pulled     2m19s                  kubelet            Successfully pulled image "nginx" in 2.09857171s (2.098667953s including waiting)
    Warning  BackOff    110s (x4 over 2m34s)   kubelet            Back-off restarting failed container qos-demo-ctr-5 in pod qos-demo-5_qos-example(d6170de3-c124-47d6-a641-6b10f5b690cb)

my docker version is latest:

docker version
Client: Docker Engine - Community
 Version:           24.0.1
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        6802122
 Built:             Fri May 19 18:06:42 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.1
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       463850e
  Built:            Fri May 19 18:05:43 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
THMAIL commented 1 year ago

Linux 172.30.94.201 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

THMAIL commented 1 year ago

The path /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/qos-demo-ctr-5/cpu.cfs_quota_us didn't exist!

run cmd docker ps -a|grep qos:

0e73e98eb193   nginx                       "/docker-entrypoint.…"   2 minutes ago    Up 2 minutes                          k8s_qos-demo-ctr-5_qos-demo-5_qos-example_a7eca3d5-bd01-4d8f-ab96-9196a79c1629_0
c122a7cd4d1b   registry.k8s.io/pause:3.6   "/pause"                 2 minutes ago    Up 2 minutes                          k8s_POD_qos-demo-5_qos-example_a7eca3d5-bd01-4d8f-ab96-9196a79c1629_0

run cmd ls /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-podd6170de3_c124_47d6_a641_6b10f5b690cb.slice/:

0e73e98eb193503a997d1e3c7073713f514c81a35b8a88fa49f5492c7860eb0d  cgroup.event_control  cpuacct.usage         cpu.cfs_quota_us   cpu.shares         tasks
c122a7cd4d1b1125fccebd6e6b24c886943213d772285f1b32a065f0a924b48d  cgroup.procs          cpuacct.usage_percpu  cpu.rt_period_us   cpu.stat
cgroup.clone_children                                             cpuacct.stat          cpu.cfs_period_us     cpu.rt_runtime_us  notify_on_release

so did the path is error? Is this a problem with my boot parameter or a bug?

sftim commented 1 year ago

If you do want help with Kubernetes @THMAIL, please ask elsewhere. This issue tracker is the right place to tell us about shortcomings in the docs, and the wrong place to get advice on using features (alpha or otherwise).

If / when you can point out a new problem, you are welcome to file an issue so that we can cover that. SIG Node can then look at improving the docs for the beta.

dshebib commented 1 year ago

/assign

@sftim Quick question about feature gates, is there a way to specify the specific feature gate that must be enbaled for alpha/beta features within the feature state tag so that we don't have to manually edit docs every time a feature graduates or the Kubernetes version updates?

criscola commented 1 year ago

Can someone please write precisely how to enable this feature? I tried to pass the flag like that: --feature-gates=InPlacePodVerticalScaling=true to kube-scheduler but kube still forbids patches to pod resources.

sftim commented 1 year ago

Hi @criscola

This issue is still waiting for a volunteer / contributor to pick it up and work on a fix.

wenzhaojie commented 1 year ago

Can someone please write precisely how to enable this feature? I tried to pass the flag like that: --feature-gates=InPlacePodVerticalScaling=true to kube-scheduler but kube still forbids patches to pod resources.

This is my config, anyone help me to check it?

cat > config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.122.41
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: kubernetes-01
  taints: null
  kubeletExtraArgs:
    feature-gates: InPlacePodVerticalScaling=true
---
apiServer:
  timeoutForControlPlane: 4m0s
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.27.2
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: "192.168.0.0/16"
scheduler:
  extraArgs:
    feature-gates: InPlacePodVerticalScaling=true
EOF
criscola commented 1 year ago

I confirm @wenzhaojie config is correct. To summarize the feature needs the corresponding feature gate InPlacePodVerticalScaling=true passed to the following components:

that should do the trick. Would be great to spend a paragraph somewhere to mention this, maybe we can edit this blog post with a short note? https://kubernetes.io/blog/2023/05/12/in-place-pod-resize-alpha/

tengqm commented 1 year ago

Regarding this issue, it might be obvious that the in-place vertical scaling has to get kubelet involved. There are some technical implementation details as well. The scheduler has to reconsider the resource requests and limits, the ResourceQuota controller has to adjust its behavior as well, so on and so forth.

This leads me to rethink about a related topic. Maybe we were right when we avoided to document the feature gates on a per-component basis. Today the feature gate list is "shared" by all components. The implementation for some features like this one (in-place scaling) may involve several components. There could be a chance that feature FOO is only about the API server and the scheduler today, but soon the developers realize that the controller-manager has to do something as well to cover a corner case.

k8s-triage-robot commented 3 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot commented 1 week ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

haircommander commented 17 hours ago

cc @tallclair @AnishShah @esotsal

From a quick glance this looks like a documentation limitation, though the code changes for beta may also affect this situation

haircommander commented 17 hours ago

/triage accepted

esotsal commented 4 hours ago

/retitle [FG:InPlacePodVerticalScaling] Incomplete prerequisites for “Resize CPU and Memory Resources assigned to Containers”

esotsal commented 3 hours ago

Hi,

I see that docker engine was used , is cri-dockerd used ? If yes this looks like the same situation described at the first item in InPlacePodVerticalScaling known issues and discussed also here

If cri-dockerd was used then i recommend to repeat the tests using a cri-o or a containerd container runtime version satisfying InPlacePodVerticalScaling CRI APIs requirements.

CRI APIs requirements for InPlacePodVerticalScaling can be found at

Added [FG:InPlacePodVerticalScaling] prefix in title, to use this input to improve documentation especially with the forthcoming graduation of InPlacePodVerticalScaling to beta. Feel free to reach InPlacePodVerticalScaling community also at sig-node-inplace-pod-resize Slack channel.