Open craigcabrey opened 1 month ago
Process view right before I ran systemctl restart k3s
:
Process view ~10 hours ago:
This shows that both k3s & containerd are growing. These views are tracking the systemd cgroup slices, not the workloads. So there should be no contamination of workload behavior on these stats (but separately, as noted above I minimized the number of pods running on this node & checked the stats of said pods -- all were within reason).
And process view after restarting:
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 45 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.
Environmental Info: K3s Version: v1.29.8
Node(s) CPU architecture, OS, and Version:
Linux venus-node-3 6.10.6-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Aug 19 14:09:30 UTC 2024 x86_64 GNU/Linux
cmdline:
Cluster Configuration:
Describe the bug:
Steps To Reproduce:
I have a simple drop in:
Expected behavior:
I have a control plane node that runs out of memory after 1-2 days. I've experimented a bit, and this happens even when the node is cordoned and minimal pods are running.
Actual behavior:
Memory usage of k3s & containerd grows over a 1-2 day period to consume all memory on the host. This happened on v1.29.6 as well, upgraded to v1.29.8 but no change was observed.
Additional context / logs:
I also have
below
logs (similar toatop
if you aren't familiar) showing the cgroup & process level stats over a 24h+ period. You can see the RSS grow over the period uncontrollably:~10 hours ago:
~15 minutes ago:
Grafana stats: