rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.52k stars 264 forks source link

[Backport release-1.22] Enable pproff #3057

Closed rancherbot closed 2 years ago

rancherbot commented 2 years ago

This is a backport issue for https://github.com/rancher/rke2/issues/3055, automatically created via rancherbot by @rancher-max

Original issue description:

Pullthrough from https://github.com/k3s-io/k3s/issues/1635

dereknola commented 2 years ago

Moving this out of To Test as there is more work to be done to enable pprof on RKE2. Currently seeing

root@ip-172-31-31-12:/home/ubuntu# curl --insecure https://localhost:6443/debug/pprof/profile
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Clearly RKE2 locks down the port more than K3s, so more work is needed.

dereknola commented 2 years ago

This does work, you need to hit port 9345 and the default time is quite large at 30 seconds.

curl --insecure https://localhost:9345/debug/pprof/profile?seconds=10 > cpu.pprof
est-suse commented 2 years ago

Cluster Configuration: 1 server Also confirmed this flag does not apply to agents

Config.yaml:

configuration 1 - enable pprof:

write-kubeconfig-mode: 644 enable-pprof: true

configuration 2 - default (pprof disabled):

write-kubeconfig-mode: 644 Validated with:

RKE2 Version/Commit: commit 5453a333aabe0d4d0fb067823bc4f7970a45a24b 16 sudo curl -sfL https://get.rke2.io | INSTALL_RKE2_COMMIT=5453a333aabe0d4d0fb067823bc4f7970a45a24b sh -

Steps To Validate:

Setup config.yaml using configuration 1 Installed RKE2 using curl command Grabbed the profile by curling the correct endpoint: curl --insecure https://localhost:9345/debug/pprof/profile > profile.pprof Grabbed the trace output by curling the correct endpoint: curl --insecure https://localhost:9345/debug/pprof/trace > trace.out Access the trace output Access the profile Perform the same steps 1-6 using configuration 2 -- pprof should be disabled. Results:

When accessing the trace output (step 5 above): note that I am actually unable to access the endpoint here, and I don't have a browser available in my VM, so I'm not 100% sure this part is working as expected, but it seems like it's doing what is expected to me so I am considering it okay. $ go tool trace trace.out 2022/06/16 18:18:08 Parsing trace... 2022/06/16 18:18:08 Splitting trace... 2022/06/16 18:18:08 Opening browser. Trace viewer is listening on http://127.0.0.1:40013

When accessing the profile (step 6 above): There may be some utilities that fail due to expected utilities not being installed on the host. This is intentional and utilities should be installed by the operator as needed. $ go tool pprof profile.pprof File: rke2 Type: cpu Time: Aug 1, 2022 at 6:57pm (UTC) Duration: 30s, Total samples = 180ms ( 0.6%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)

(pprof) top Showing nodes accounting for 160ms, 88.89% of 180ms total Showing top 10 nodes out of 54 flat flat% sum% cum cum% 60ms 33.33% 33.33% 60ms 33.33% runtime.epollwait 20ms 11.11% 44.44% 80ms 44.44% runtime.netpoll 10ms 5.56% 50.00% 10ms 5.56% runtime.adjusttimers 10ms 5.56% 55.56% 10ms 5.56% runtime.findObject 10ms 5.56% 61.11% 100ms 55.56% runtime.findrunnable 10ms 5.56% 66.67% 10ms 5.56% runtime.futex 10ms 5.56% 72.22% 10ms 5.56% runtime.getitab 10ms 5.56% 77.78% 10ms 5.56% runtime.newobject 10ms 5.56% 83.33% 10ms 5.56% runtime.pageIndexOf 10ms 5.56% 88.89% 120ms 66.67% runtime.park_m

(pprof) png failed to execute dot. Is Graphviz installed? Error: exec: "dot": executable file not found in $PATH

Exited pprof and installed graphviz. For example on ubuntu:

$ sudo apt update && sudo apt install graphviz

Went back into pprof and this worked:

(pprof) png Generating report in profile001.png

image

When pprof is disabled, cannot access any outputs as expected: $ go tool pprof profile.pprof profile.pprof: parsing profile: unrecognized profile format failed to fetch any source profiles Additional context / logs:

When trying pprof on the endpoint directly, it fails as expected: Fetching profile over HTTP from https://localhost:6443/debug/pprof/profile cmdline2.pprof: parsing profile: unrecognized profile format failed to fetch any source profiles

Additional context / logs:

When trying pprof on the endpoint directly, it fails as expected: cmdline2.pprof: parsing profile: unrecognized profile format failed to fetch any source profiles

Help text is sufficient: rke2 server --help | grep -i pprof --enable-pprof (experimental) Enable pprof endpoint on supervisor port