performancecopilot / pcp

Performance Co-Pilot
https://pcp.io
Other
965 stars 236 forks source link

Add some CPU metrics in pmstat output #2006

Closed myllynen closed 2 months ago

myllynen commented 3 months ago

Recent versions of vmstat(8) report CPU wait, steal, and guest time that have become available in Linux kernel during the past few years. It could be nice if pmstat(1) would also report these additional CPU metrics as well. Thanks.

natoscott commented 3 months ago

@myllynen is this different to the -x option?

$ pmstat --help 2>&1 | grep -- -x
  -x, --xcpu            extended CPU statistics reporting
myllynen commented 3 months ago

D'oh, I somehow completely missed that!

But the extended metrics do not include guest time so I think that could be added.

Thanks.

natoscott commented 3 months ago

Yep, missing & will be nice to have - thanks for checking. I'll queue it up but hopefully someone else drops in and fixes it in the meantime.

myllynen commented 3 months ago

This looks pretty straightforward but for the record few things I noticed:

1) src/pmstat/pmstat.pmlogger and src/pmlogconf/tools/pmstat are slightly different, the outcome should be the same but not sure should these be unified

2) While at it perhaps the output could include guest_nice as well, it's not part of vmstat(8) output but this would make pmstat output more complete. However I'm not sure does this provide any real value for pmstat(1) users.

3) Would the calculation of totals be stil like this or should guest time(s) be added as well:

user = s->val[cpu_nice].ull + s->val[cpu_user].ull; kernel = s->val[cpu_intr].ull + s->val[cpu_sys].ull + s->val[cpu_steal].ull; idle = s->val[cpu_idle].ull + s->val[cpu_wait].ull;

This is a bit unclear as in /proc/pid/stat utime includes guest_time according to the man page but nothing is stated for /proc/stat and guest times there so it's not entirely clear to me whether stolen time includes or is different from guest / guest_nice time?

Thanks.

natoscott commented 2 months ago

Have a look at the way Mark setup the pmchart CPU and vCPU views are setup in terms of metrics used to answer this (note also vuser and vnice). And yeah, definitely some updates needed to those two configs, esp. when this addition is made to pmstat.

myllynen commented 2 months ago

I checked the kernel sources and the related git log:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/kernel/sched/cputime.c https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/cputime.c

It looks like guest time is included in the user time since, at least since a couple of years ago. So this makes me think the above pasted calculations can be left as-is. Thanks.