CPU usage sometimes is overstate

soyuka / pidusage

Cross-platform process cpu % and memory usage of a PID

MIT License

512 stars 55 forks source link

CPU usage sometimes is overstate #89

Closed sempi closed 4 years ago

sempi commented 5 years ago

Sometimes the cpu usage is >100%*vCPU. E.g. a 12 core vCPU instance reports 3600% usage.

We use a 2 second poll interval to invoke pidusage (in part to avoid issues with small observation intervals). Every once in a while, repeatable across large number of instances, pidusage will report an CPU usage that is larger than possible given the number of vCPUs on the system.

OS: virtualized Debian Platform: gcp

soyuka commented 5 years ago

See https://github.com/soyuka/pidusage/issues/58 I think that this is related.

sempi commented 5 years ago

The issue is not that it reports 0-1200 for a 12 core system. The issue is that it reports >1200 for a 12 core system. Issue #58 primarily talks about the difference for reporting 100% vs 1200% usage for an example 12 core system.

In our case the reading is not consistent. For example, a multi threaded program is reported typically as 800% when it uses the equivalent of 8 cores. However, every once in a while we get a reading of 3600% which is physically not possible thus there must be an issue with the approach, e.g. timestamp not accurately matching to reading. So the pidusage return values are 800%, 800%, 3600%, 600%, 800% when we poll every 2 seconds. B/c we poll at 2 second intervals, it is unlikely a observation interval issue.

soyuka commented 5 years ago

Interesting, may you try to force using the ps method instead of using procfiles? (see https://github.com/soyuka/pidusage/blob/master/lib/stats.js#L11)

It can be a workaround until I can investigate what's wrong with the procfiles interpretation.

MichaelLeeHobbs commented 4 years ago

Having a similar issue on Windows. Some things I have figured out.

Don't use setInterval for monitoring. At high CPU usages, I was having overlapping intervals.
Turbo overclocking really borks the numbers up when it kicks in. Testing on a Core i7-9700K which can go from 3.60 GHz up to 4.90 GHz.
Longer poll interval via setTimeout or some other non setInterval method will give major improvements. I have seen a significant improvement going from 1 to 2-second intervals.
Lastly adding const {exec} = require('child_process'); exec('wmic process where "ProcessId=' + process.pid + '" CALL setpriority 256') to the beginning of the monitor process also improved results.

soyuka commented 4 years ago

Interesting findings. I wish we could have another API then wmic to get these information though as wmic consume lots of resources.