Closed TylerBrock closed 2 years ago
this is a similar issue: https://github.com/rancher/k3s/issues/294
Yup I read through that and it seems like there was no outcome/solution so I wanted to try and move things forward by seeing if anyone is against this. I would be willing to try to add pprof support via a PR but I want to know my time won't be a complete waste if folks are against it / or if they have profiled before and know what the problem is.
I would be very happy to see the root cause for this issue...
@erikwilson what do you guys think about it?
Hello @TylerBrock
I am also very interested in the result. Currently we have the same problems with k3s-server using about 20-30 % CPU on a fresh Lubuntu 20.04 installation. It would be great to find and remove the cause.
regards Michael
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
not stale. Still happens in 2021 running v1.21.3+k3s1 on a pi cluster. No other pods running. k3s process is constantly consuming about 20-30% CPU.
In my cluster I stopped k3s on the primary and the agent node that is showing the consistent CPU usage dropped down. When the primary came back up the CPU went back up on the node agent.
I also see 10-13% consistently on a fresh install on Debian 11 w/o any workload.
Yes, that is to be expected. You might check out the expected resource figures at:
https://rancher.com/docs/k3s/latest/en/installation/installation-requirements/resource-profiling/
PR was merged https://github.com/k3s-io/k3s/pull/5527 to address this
2968a83bc046f43be62852b94d07c531de97c416
Environment Details:
Node(s) CPU architecture, OS, and Version:
Ubuntu 22.04 LTS
Validated both x86_64 and aarch64:
$ uname -a
Linux ip-172-31-47-67 5.15.0-1011-aws #14-Ubuntu SMP Wed Jun 1 20:54:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ uname -a
Linux ip-172-31-26-162 5.15.0-1009-aws #11-Ubuntu SMP Thu May 26 19:39:49 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Cluster Configuration: 1 server Also confirmed this flag does not apply to agents
Config.yaml:
# configuration 1 - enable pprof:
write-kubeconfig-mode: 644
enable-pprof: true
# configuration 2 - default (pprof disabled):
write-kubeconfig-mode: 644
Validated with:
K3S Version/Commit:
commit 2968a83bc046f43be62852b94d07c531de97c416
curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=2968a83bc046f43be62852b94d07c531de97c416 sh -
Steps To Validate:
curl
ing the correct endpoint: curl --insecure https://localhost:6443/debug/pprof/profile > profile.pprof
curl
ing the correct endpoint: curl --insecure https://localhost:6443/debug/pprof/trace > trace.out
Results:
$ go tool trace trace.out
2022/06/16 18:18:08 Parsing trace...
2022/06/16 18:18:08 Splitting trace...
2022/06/16 18:18:08 Opening browser. Trace viewer is listening on http://127.0.0.1:40013
$ go tool pprof profile.pprof
File: k3s
Type: cpu
Time: Jun 16, 2022 at 6:08pm (UTC)
Duration: 30s, Total samples = 1.69s ( 5.63%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top Showing nodes accounting for 860ms, 50.89% of 1690ms total Showing top 10 nodes out of 511 flat flat% sum% cum cum% 160ms 9.47% 9.47% 160ms 9.47% runtime.futex 130ms 7.69% 17.16% 130ms 7.69% syscall.Syscall6 100ms 5.92% 23.08% 100ms 5.92% runtime.epollwait 80ms 4.73% 27.81% 210ms 12.43% github.com/prometheus/client_golang/prometheus.(MetricVec).hashLabelValues 80ms 4.73% 32.54% 80ms 4.73% syscall.Syscall 70ms 4.14% 36.69% 190ms 11.24% github.com/prometheus/client_golang/prometheus.(histogram).Observe 70ms 4.14% 40.83% 70ms 4.14% github.com/prometheus/client_golang/prometheus.(*histogram).observe (inline) 70ms 4.14% 44.97% 70ms 4.14% runtime.write1 50ms 2.96% 47.93% 50ms 2.96% github.com/prometheus/client_golang/prometheus.hashAdd 50ms 2.96% 50.89% 60ms 3.55% runtime.cgocall
(pprof) png failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH
$ sudo apt update && sudo apt install graphviz
(pprof) png Generating report in profile001.png
Gif output from x86:
![profile001](https://user-images.githubusercontent.com/55997940/174147453-b0b4fe17-5344-4546-8526-4e04cf984fd9.gif)
PNG output from aarch64
![profile001](https://user-images.githubusercontent.com/55997940/174147462-0a9ad76d-5d3b-431d-8057-be6c018c34f3.png)
3. When pprof is disabled, cannot access any outputs as expected:
$ go tool pprof profile.pprof profile.pprof: parsing profile: unrecognized profile format failed to fetch any source profiles
**Additional context / logs:**
1. When trying pprof on the endpoint directly, it fails as expected:
$ go tool pprof https://localhost:6443/debug/pprof/profile Fetching profile over HTTP from https://localhost:6443/debug/pprof/profile https://localhost:6443/debug/pprof/profile: Get "https://localhost:6443/debug/pprof/profile": x509: certificate signed by unknown authority failed to fetch any source profiles
2. The other endpoints have different outputs and do not give pprof data:
$ curl --insecure https://localhost:6443/debug/pprof/cmdline > cmdline.pprof $ go tool pprof cmdline.pprof cmdline.pprof: parsing profile: unrecognized profile format failed to fetch any source profiles
3. Help text is sufficient:
$ k3s server --help | grep -i pprof --enable-pprof (experimental) Enable pprof endpoint on supervisor port
Hi all, I'm running k3s on x86_64 as well as armv7 (raspberry pi 4) and I've noticed that there is a consistent amount of CPU in use after starting the service while it should just be idle. One core always take about 15-20% cpu with no pods of my own running.
Additionally, every ~5 seconds it uses ~25% of all four cores on both boxes to do something (I think this is collecting metrics most likely).
I'm not sure if this is a kubernetes problem in general but for something like k3s I'd like to figure out what is going on (since it's supposed to be lightweight).
Would anyone object to adding pprof to k3s to see exactly what it's doing with that constant CPU?
Some additional details: