Closed GoogleCodeExporter closed 9 years ago
Mmmm... this is hard to debug without being able to replicate it directly.
In order to try to figure something out you should log the output of both
cpu_times(percpu=True) and cpu_percent(timeout=..., percpu=True).
Can you do that?
Original comment by g.rodola
on 16 Jun 2013 at 10:13
No problem, I'll run those tests and come back with the results. I'll continue
to use the default timeout=0.1, since that was what I was doing before. Should
I test any different timeout values as well?
Thanks
Original comment by mjryan1...@gmail.com
on 16 Jun 2013 at 11:49
No I think that alone should be fine for now.
Original comment by g.rodola
on 16 Jun 2013 at 11:53
I would be curious if the problem is reproducible with a longer interval. The
accuracy of the CPU time and system clock time may not be exact, so a larger
interval (say, 1.0s) may cause the issue to disappear.
Original comment by jlo...@gmail.com
on 17 Jun 2013 at 12:07
I was finally able to run some tests. My first two calls (and only two so far)
have returned at least one CPU with a negative system percent. The output is
in the attached file. I did tests with two interval sizes: 0.1 and 1.0.
Interestingly, the interval with 1.0 had both negative readings. I will
continue to run these tests overnight.
Original comment by mjryan1...@gmail.com
on 18 Jun 2013 at 1:37
Attachments:
Here are the results from my overnight tests. There are 338 negative
cpupercent values (on all different CPUs) across 178 tests. Both interval=0.1
and interval=1.0 had negative values, but it looks like they appeared 2 to 3
times more frequently in interval=1.0. Each test was done 5 minutes apart
(denoted in the file by an empty line).
Original comment by mjryan1...@gmail.com
on 18 Jun 2013 at 4:58
Attachments:
I don't have time to look into this now (I'm in Sicily).
Original comment by g.rodola
on 18 Jun 2013 at 9:34
No problem, thank you for letting me know.
Original comment by mjryan1...@gmail.com
on 19 Jun 2013 at 2:13
Original comment by g.rodola
on 9 Jul 2013 at 10:58
Bumping up priorioty (...and I'm currently looking into this but it's hard to
replicate).
Original comment by g.rodola
on 26 Sep 2013 at 10:28
I identified the origin of the problem although I don't know why it happens.
AFAIK CPU times are *always* supposed to increase over time or at least remain
the same.
That's just because time cannot go backwards.
It seems this is not the case on Windows.
Here's a test case:
import psutil, sys
last = psutil.cpu_times()
while 1:
new = psutil.cpu_times()
if new.system < last.system:
# error
sys.exit("{}\n{}".format(last, new))
last = new
...and here's the output:
cputimes(user=4421.625, system=25216.125, idle=1234545.34375)
cputimes(user=4421.640625, system=25216.09375, idle=1234545.375)
The problem is the second 'system' value which is smaller than the first one.
We currently calculate 'system' CPU time by substracting 'idle' time from
'kernel' time:
https://code.google.com/p/psutil/source/browse/psutil/_psutil_mswindows.c?spec=s
vn99ab875640329a01a02ef0eefe6be9f5576098a7&r=46552ff6ac2be53d14b10487ee1cd95d2ef
3f57b#734
That's because according to
http://msdn.microsoft.com/en-us/library/windows/desktop/ms724400(v=vs.85).aspx
"kernel time includes the amount of time the system has been idle" so that
looks like the natural thing to do.
Either this assumption is wrong or the Windows API we use is broken somehow,
I'm not sure.
Original comment by g.rodola
on 27 Sep 2013 at 12:10
Wow, it does seem like that GetSystemTimes function is not behaving as it's
documented. I found a few posts about having issues with it:
http://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/50c32a2d-a09a-43a1-
a731-0f24fb0283d0/inconsistencies-in-getsystemtimes-function
and this one (see last comment):
http://social.msdn.microsoft.com/Forums/windowsdesktop/en-US/b0f2ae96-01b1-4fa2-
a0e7-80901011fe02/total-cpu-usage-time-using-getsystemtimes
Original comment by mjryan1...@gmail.com
on 27 Sep 2013 at 9:02
I researched a lot but couldn't find any solution or explanation as to why
'system' time on Windows can decrease between calls.
The best I could come up with is revision ee424d5aecf7 where I forced erroneous
percentages to 0 or 100.
Considering percentage functions are kind of different than others in that they
return values which are "approximate" and repeated over time I think this is
not the end of the world.
Of course I'm not happy with that but I'm not sure what else to do.
I'm going to close this out for now and consider it fixed.
Original comment by g.rodola
on 27 Sep 2013 at 7:20
Original comment by g.rodola
on 28 Sep 2013 at 10:06
Thanks for taking the time to look at this, I appreciate it!
Original comment by mjryan1...@gmail.com
on 28 Sep 2013 at 5:30
Closing out as fixed. Release 1.1.0 is now available for download.
Original comment by g.rodola
on 28 Sep 2013 at 5:32
You're welcome. =)
Original comment by g.rodola
on 28 Sep 2013 at 5:33
Original issue reported on code.google.com by
mjryan1...@gmail.com
on 14 Jun 2013 at 1:13