luigimarmo / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

psutil.cpu_times_percent().system returns negative CPU values #392

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Get psutil.cpu_times_percent() many times for a long period of time (I ran 
it every minute for about 5 hours).

What is the expected output?
I expected all the values in psutil.cpu_times_percent() to be between 0 and 100.

What do you see instead?
psutil.cpu_times_percent().system is sometimes negative (-1.9, -3.2, etc.).  
This seems to make psutil.cpu_times_percent().idle sometimes be above 100.

What version of psutil are you using? What Python version?
psutil-0.7.0.win32-py2.7
Python 2.7.5

On what operating system? Is it 32bit or 64bit version?
Windows 7 64-bit

Please provide any additional information below.
Once I noticed that psutil.cpu_times_percent().system sometimes returned 
negative values, I collected all the times this happened.  I checked the value 
every minute for about 5 hours and saw these results:

psutil.cpu_percent(): 0.0
psutil.cpu_times_percent(): cpupercent(user=3.2, system=-3.2, idle=100.0)

psutil.cpu_percent(): 8.6
psutil.cpu_times_percent(): cpupercent(user=3.6, system=-0.9, idle=97.3)

psutil.cpu_percent(): 6.9
psutil.cpu_times_percent(): cpupercent(user=0.0, system=-2.1, idle=102.1)

psutil.cpu_percent(): 0.0
psutil.cpu_times_percent(): cpupercent(user=0.0, system=-1.9, idle=101.9)

Notice that it didn't happen very often, only about once an hour.  Also, 
psutil.cpu_times_percent().system was the only one that I ever saw go negative. 
 I've only tested on Windows 7.  My machine has 8 CPUs:

>>> psutil.NUM_CPUS
8

Thank you,
Mike

Original issue reported on code.google.com by mjryan1...@gmail.com on 14 Jun 2013 at 1:13

GoogleCodeExporter commented 8 years ago
Mmmm... this is hard to debug without being able to replicate it directly.
In order to try to figure something out you should log the output of both 
cpu_times(percpu=True) and cpu_percent(timeout=..., percpu=True).
Can you do that?

Original comment by g.rodola on 16 Jun 2013 at 10:13

GoogleCodeExporter commented 8 years ago
No problem, I'll run those tests and come back with the results.  I'll continue 
to use the default timeout=0.1, since that was what I was doing before.  Should 
I test any different timeout values as well?

Thanks

Original comment by mjryan1...@gmail.com on 16 Jun 2013 at 11:49

GoogleCodeExporter commented 8 years ago
No I think that alone should be fine for now.

Original comment by g.rodola on 16 Jun 2013 at 11:53

GoogleCodeExporter commented 8 years ago
I would be curious if the problem is reproducible with a longer interval. The 
accuracy of the CPU time and system clock time may not be exact, so a larger 
interval (say, 1.0s)  may cause the issue to disappear. 

Original comment by jlo...@gmail.com on 17 Jun 2013 at 12:07

GoogleCodeExporter commented 8 years ago
I was finally able to run some tests.  My first two calls (and only two so far) 
have returned at least one CPU with a negative system percent.  The output is 
in the attached file.  I did tests with two interval sizes: 0.1 and 1.0.  
Interestingly, the interval with 1.0 had both negative readings.  I will 
continue to run these tests overnight.

Original comment by mjryan1...@gmail.com on 18 Jun 2013 at 1:37

Attachments:

GoogleCodeExporter commented 8 years ago
Here are the results from my overnight tests.  There are 338 negative 
cpupercent values (on all different CPUs) across 178 tests.  Both interval=0.1 
and interval=1.0 had negative values, but it looks like they appeared 2 to 3 
times more frequently in interval=1.0.  Each test was done 5 minutes apart 
(denoted in the file by an empty line).

Original comment by mjryan1...@gmail.com on 18 Jun 2013 at 4:58

Attachments:

GoogleCodeExporter commented 8 years ago
I don't have time to look into this now (I'm in Sicily).

Original comment by g.rodola on 18 Jun 2013 at 9:34

GoogleCodeExporter commented 8 years ago
No problem, thank you for letting me know.

Original comment by mjryan1...@gmail.com on 19 Jun 2013 at 2:13

GoogleCodeExporter commented 8 years ago

Original comment by g.rodola on 9 Jul 2013 at 10:58

GoogleCodeExporter commented 8 years ago
Bumping up priorioty (...and I'm currently looking into this but it's hard to 
replicate).

Original comment by g.rodola on 26 Sep 2013 at 10:28

GoogleCodeExporter commented 8 years ago
I identified the origin of the problem although I don't know why it happens.
AFAIK CPU times are *always* supposed to increase over time or at least remain 
the same.
That's just because time cannot go backwards.
It seems this is not the case on Windows.
Here's a test case:

import psutil, sys
last = psutil.cpu_times()
while 1:
    new = psutil.cpu_times()
    if new.system < last.system:
        # error
        sys.exit("{}\n{}".format(last, new))
    last = new

...and here's the output:
cputimes(user=4421.625, system=25216.125, idle=1234545.34375)
cputimes(user=4421.640625, system=25216.09375, idle=1234545.375)

The problem is the second 'system' value which is smaller than the first one.

We currently calculate 'system' CPU time by substracting 'idle' time from 
'kernel' time:
https://code.google.com/p/psutil/source/browse/psutil/_psutil_mswindows.c?spec=s
vn99ab875640329a01a02ef0eefe6be9f5576098a7&r=46552ff6ac2be53d14b10487ee1cd95d2ef
3f57b#734
That's because according to 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms724400(v=vs.85).aspx 
"kernel time includes the amount of time the system has been idle" so that 
looks like the natural thing to do.
Either this assumption is wrong or the Windows API we use is broken somehow, 
I'm not sure.

Original comment by g.rodola on 27 Sep 2013 at 12:10

GoogleCodeExporter commented 8 years ago
Wow, it does seem like that GetSystemTimes function is not behaving as it's 
documented.  I found a few posts about having issues with it:

http://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/50c32a2d-a09a-43a1-
a731-0f24fb0283d0/inconsistencies-in-getsystemtimes-function

and this one (see last comment):

http://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/b0f2ae96-01b1-4fa2-
a0e7-80901011fe02/total-cpu-usage-time-using-getsystemtimes

Original comment by mjryan1...@gmail.com on 27 Sep 2013 at 9:02

GoogleCodeExporter commented 8 years ago
I researched a lot but couldn't find any solution or explanation as to why 
'system' time on Windows can decrease between calls.
The best I could come up with is revision ee424d5aecf7 where I forced erroneous 
percentages to 0 or 100.
Considering percentage functions are kind of different than others in that they 
return values which are "approximate" and repeated over time I think this is 
not the end of the world.
Of course I'm not happy with that but I'm not sure what else to do.
I'm going to close this out for now and consider it fixed.

Original comment by g.rodola on 27 Sep 2013 at 7:20

GoogleCodeExporter commented 8 years ago

Original comment by g.rodola on 28 Sep 2013 at 10:06

GoogleCodeExporter commented 8 years ago
Thanks for taking the time to look at this, I appreciate it!

Original comment by mjryan1...@gmail.com on 28 Sep 2013 at 5:30

GoogleCodeExporter commented 8 years ago
Closing out as fixed. Release 1.1.0 is now available for download.

Original comment by g.rodola on 28 Sep 2013 at 5:32

GoogleCodeExporter commented 8 years ago
You're welcome. =)

Original comment by g.rodola on 28 Sep 2013 at 5:33