Closed rubber-ant closed 16 hours ago
Is this issue for Talos Linux or Grafana?
The computation is 3-4 seconds but Grafana/Prometheus for some reason is not able to show same as taloctl dashboard
So I don't quite see what would you like us to fix on Talos side? Do you think that dashboard shows CPU usage wrong?
Also please keep in mind that CPU usage is split at least into user/sys time, talosctl dashboard
shows aggregate of both.
I initially thought this was a bug kernel on Talos, but I no longer believe this to be the case.
anyway , what kernel is using tag v1.8.2
?
You can check yourself with kubectl get nodes -o wide
;)
I don't think it's a bug anywhere, but your need to understand metrics and they way are reported and presented a bit more to get to the correct conclusion.
Bug Report
pod having cpu throttling but it never hit the limit on Grafana
Description
On talosctl dashboard on the monitor tab I can see when the PID for this pod with 1 CPU takes CPU% = 100% , meaning the full CPU allocated but on grafana it show less 40% - 60% but it never reach 100% of CPU utilisation.
I noticed a pod with CPU limits set to 1, where computations take 5 seconds. When I remove the CPU limit or increase it to 8, the computation time drops to 0.2–0.5 seconds.
On the
talosctl dashboard
(monitor
tab), the pod's PID shows 100% CPU% usage, meaning it's fully utilising the allocated CPU and it's what I'm expecting to see on Grafana. However, Grafana shows only 40–60% CPU utilization, never reaching 100%.On Grafana shows usage of 0.4-0.6 CPU with limit to 1.
I tried set Prometheus with:
also in Grafana on the chart CPU set
Min interval
to 1s without any luckEnvironment
Server Version: v1.29.10
Ubuntu 22.04.5 - 5.15.0-124-generic
usingvagrant 2.4.1