kubernetes / kubectl

Issue tracker and mirror of kubectl code
Apache License 2.0
2.85k stars 920 forks source link

kubectl using 1200% CPU on MacOS 14.4.1 #1668

Open philippefutureboy opened 1 week ago

philippefutureboy commented 1 week ago

What happened:

I always keep a terminal open with watch "kubectl get pods" while I work, so that if I can at a glance see the status of my remote cluster. I noticed today while working that my computer was sluggish. When looking in activity monitor, kubectl was running at 1200% (12 full CPU cores) CPU usage, with low memory usage. At that time, watch "kubectl get pods" had been running for 5d 14h, polling state every 2s while my laptop is not in sleep mode. I killed the command watch "kubectl get pods" and the process successfully exited, releasing the CPU load.

What you expected to happen:

Not eat 12 full CPU, it's polling once every 2 sec.

How to reproduce it (as minimally and precisely as possible):

No idea really! Anything I can do to help diagnose this? The only reason that I'm posting here is that high CPU usage like this can be indicative of an exploited security vulnerability, and thus why I'm taking proactive action to open this issue.

I think my kubectl is packaged directly with gcloud. I'm not sure; how do I check?

Anything else we need to know?:

Environment:

Client Version: v1.30.4 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.4-gke.1348000

k8s-ci-robot commented 1 week ago

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
ardaguclu commented 1 week ago

/kind support I'd recommend passing -v=9 to see what is happening.

philippefutureboy commented 1 week ago

Thank @ardaguclu. I've added the flag and will be monitoring CPU usage. If anything happens I'll let you know :)

brianpursley commented 1 week ago

How do I check the integrity of the program (checksum perhaps?)?

The sha512 hash (for the gz) is published in the changelog. For example, https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#client-binaries

Something like this should work:

  1. Download the client binaries archive.
  2. Compute the hash of the archive you downloaded to confirm it matches what the changelog says it should be.
  3. Extract the archive.
  4. Compute the hash for the extracted binary (This is the expected hash).
  5. Compute the hash for your local binary and compare to confirm that it matches what you got in step 4.

Example (your will want to use darwin instead of linux-amd64):

~/Downloads $ shasum -a 512 kubernetes-client-linux-amd64.tar.gz 
7551aba20eef3e2fb2076994a1a524b2ea2ecd85d47525845af375acf236b8afd1cd6873815927904fb7d6cf7375cfa5c56cedefad06bf18aa7d6d46bd28d287  kubernetes-client-linux-amd64.tar.gz
~/Downloads $ tar xvf kubernetes-client-linux-amd64.tar.gz kubernetes
kubernetes/
kubernetes/client/
kubernetes/client/bin/
kubernetes/client/bin/kubectl
kubernetes/client/bin/kubectl-convert
~/Downloads $ shasum -a 512 kubernetes/client/bin/kubectl
1adba880a67e8ad9aedb82cde90343a6656147e7f331ab2e2293d4bc16a280591bd3b912873f099f11cde2f044d8698b963ed45fadedfe1735d99158e21e44a0  kubernetes/client/bin/kubectl

Then get your local kubectl's hash and compare it...

shasum -a 512 $(which kubectl)
brianpursley commented 1 week ago

The interesting thing about this is that kubectl is not running for 5 days, it is being invoked by watch every 2 seconds for 5 days.

In addition to using -v=9 as @ardaguclu suggested...

If it happens again, try doing the following in another terminal, while the problem is occurring, to collect information that might be helpful to diagnose the problem:

ps -F $(pgrep kubectl)
pgrep kubectl | xargs -L1 lsof -p
philippefutureboy commented 1 week ago

Fantastic @brianpursley, thanks for the additional tips! I will be checking the checksum tomorrow :)

The interesting thing about this is that kubectl is not running for 5 days, it is being invoked by watch every 2 seconds for 5 days.

I was thinking the same thing - after 5 days, maybe that there's some kind of low-level error that leads to more CPU consumption or some data that accumulates. But every 2 second one execution? Shouldn't be an issue.

I'll follow up shortly!

philippefutureboy commented 1 day ago

Hi @brianpursley! With some extra delays, I've done the sha1 check, and the checksum don't match. I'm not sure whether they should match in the first place because:

  1. My kubectl binary is the one distributed by google-cloud-sdk
  2. When running kubectl version I get three versions - one for the client, one for the server, and one for kustomize; whereas the downloads on github are separate between the client and server

Here are the steps taken:

  1. Determine the kubectl location on fs & version:
$ which kubectl
/Users/philippe/google-cloud-sdk/bin/kubectl
$ kubectl version
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.5-gke.1014001
  1. Download the kubectl binary from source

2.1. Navigate to https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#downloads-for-v1304 2.2. Click on kubernetes-client-darwin-amd64.tar.gz 2.3. Extract kubectl to a folder using tar -xvzf kubernetes-client-darwin-amd64.tar.gz

  1. Assert checksum match
$ shasum -a 512 $(which kubectl)
a49c02cbe3d3b80011a0d53118d3d8f921efbad89e6c986d39e05a5d486702a9020ff324a1c01d79f24234fa5d8783448352c980379c876344efd0eb332377d4  /Users/philippe/google-cloud-sdk/bin/kubectl

and

$ shasum -a 512 /Users/philippe/Downloads/kubernetes/client/bin/kubectl 
78c72e33056778b37c43e272e2b997be272edc70fd6f650e29eb4ab1221c0184955470611f6dba20592f778135c348752e423776c51e97f91a3cad33216430bc  /Users/philippe/Downloads/kubernetes/client/bin/kubectl

As you can see the checksum mismatch even though the client version matches. How can I assert that the kubectl binary I have hasn't been tampered with/that there wasn't a supply-chain attack? I can't just download the google-cloud-sdk on a docker image and obtain the shasum of the installed kubectl bin, all that would tell me is that my kubectl binary matches that of the provider, rather than if it matches the one compiled from this project.

Thank you for your help! Cheers, Philippe

philippefutureboy commented 1 day ago

Note that I also opened a support case with Google Cloud internally to assist in confirming the integrity of the kubectl binary packaged as part of google-cloud-sdk. Any information on your end will still be helpful, and I'll provide any information I got from Google Cloud's support team.

Last note, no spike in CPU usage has been noticed since last encountered.

brianpursley commented 6 hours ago

@philippefutureboy Do the Google Cloud SDK maintainers build their own kubectl binary that has gcloud specific changes?

If so, and if kubectl version reports it as "regular" kubectl, that seems like it could be confusing.

philippefutureboy commented 5 hours ago

@brianpursley that's also what I'm trying to figure out with my support rep. I'll keep you in the loop with any new info.