Open alvarocabanas opened 5 months ago
Pinging code owners for receiver/hostmetrics: @dmitryax @braydonk. See Adding Labels via Comments if you do not have permissions to add labels yourself.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Unfortunately, the fix seems to reside in gopsutil.
Component(s)
receiver/hostmetricsreceiver
What happened?
Description
On a windows instance having 48 * 2 logical cpus, windows groups CPUs into batches of 64 logical cpus in a processor group, but Gopsutil's Cpu.TimesWitContext defined on the cpuscraper in here and used to by the resource detector to calculate the cpu times and utilization, indistinctly gets this data from one of the two processor groups and not all of them.
This is generating variable cpu Utilization, sometimes negative and sometimes full usage even if only half the cores are running.
This bug is reported on the gopsutil library.
Steps to Reproduce
In our case we reproduced it on a 'm5n.metal' machine in AWS with windows-server-22 but we know reports of it happening in other windows with more than one processor groups.
Expected Result
Correct Cpu Usage and times.
Actual Result
Cpu data points from one of the 2 processor groups randomly.
Collector version
v0.101.0
Environment information
Environment
OS: Windows-server 22
OpenTelemetry Collector configuration
Log output
No response
Additional context
No response