Open randomascii opened 9 years ago
Pinging this issue. Looks like you've done some previous research into CPU performance counters.
Nowadays it appears that WPA supports graphs for CPU performance counters (PMC Rollovers
and PMC Graph
in the 'Select Tables' menu). XPerf needs to be configured to enable those and it appears that UIForETW does not allow us to specify custom xperf flags.
Microsoft's PerfView supports enabling and viewing these performance counters (blog post - but the UI is (unfortunately) 100 to 1000x uglier than WPA).
Can we hijack this issue to research enabling CPU performance counter logging with UIForETW? Maybe make it a default or otherwise add it to the options menu?
Interesting. It is possible to add some basic flags to request additional user-mode and system providers - see the settings dialog. However it is quite likely that that is not sufficient, or at least not convenient. I would support having a CPU performance counter mode, instead of using batch files.
Often the best thing to do is to start by experimenting with the existing batch files, and see what works, and then encode that into the UI. I suspect that a different mode (like tracing to memory, tracing to file, and heap tracing) might be appropriate, but I'm not sure.
Looks like xperf natively supports logging PMC counters (found here):
Example invocation:
xperf -on PROC_THREAD+LOADER+pmc_profile+profile -pmcprofile InstructionRetired -f c:\home\kernel.etl
Important bits being -on pmc_profile
and -pmcprofile
.
Additionally, to collect callstacks you'll want to specify PmcInterrupt
for the stack walker flags.
Okay, so a plausible UI would be a way to select a set of counters (a list of check boxes? free-form text?) and when one or more are selected the pmc_profile provider would be selected.
I don't know how standardized the set of counters is (query at run-time to create the list?) and how standardized the limits on the number of counters is, or the combinations. But, it all sounds very interesting.
There probably needs to be some OS version detection, and perhaps xperf version detection to gate this feature, and maybe some options should be disabled (stack walking?) to minimize distortion, or that could be left up to the user.
Yeah - we can query the list of counters and intervals. wpr -pmcsources
comes up with a list, but perhaps there's a way to get this programmatically?
Then we can have a UI with checkboxes and intervals. Perhaps another checkbox for collecting callstacks on PMC interrupts?
As for version detection - I'm not too sure when this was introduced into xperf...
The latest releases of UIforETW guarantee that the 1903 (currently latest) version of xperf is installed on Windows 10, so xperf version detection shouldn't be needed.
I don't know know what Windows 10 OSes support this but probably if wpr -pmcsources gives us data then we're okay. And, running wpr -pmcsources and parsing the output doesn't sound like too much work. So, I think the plan would be:
if (Windows10()) { auto pmc_counters = RunAndParse("wpr -pmcsources"); if (pmc_counters.size() > 0) { PopulateAndEnableListboxInSettings(); } }
Lots of details, such as how to deal with setting the intervals, and testing to see whether sampling (call stacks) on PMC interrupts is useful. My main use-case has just been to be able to see IPC and mispredict rates by process. This would make that easier and would presumably improve the granularity to per-thread.
I would like to change the default interval for the events that show up with -pmcsources. Is there any option to change the interval for xperf or wpr? The defaults seem too low for busy CPUs - results in tons of samples and consequently dropped samples (at least for cycles and instructions retired)
The only way that I have recorded PMC data is documented here:
https://randomascii.wordpress.com/2016/11/27/cpu-performance-counters-on-windows/
This technique records them on context switches. This makes attribution to particular pieces of code difficult, but it does give you a per-process overview.
If you learn any more then please comment here or on that blog post.
I have just started to play around with PMC data from xperf using:
xperf.exe -on PROC_THREAD+LOADER+PROFILE+DISK_IO+PMC_PROFILE -pmcprofile InstructionRetired,TotalCycles -stackwalk profile+PmcInterrupt
Works fine for a lightly loaded system but when I ran something that kept half of the cores in the system ~100% utilized - lost lots of samples.
I will try the tip from your blog. I would like to see the data at function level but per process maybe OK to start with.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd796393(v=vs.85).aspx