google / UIforETW

User interface for recording and managing ETW traces
https://randomascii.wordpress.com/2015/04/14/uiforetw-windows-performance-made-easier/
Apache License 2.0
1.57k stars 201 forks source link

Record pagefile usage of WS monitored processes #37

Open randomascii opened 9 years ago

randomascii commented 9 years ago

This information can be found in the _SYSTEM_PROCESS_INFORMATION struct:

http://processhacker.sourceforge.net/doc/struct___s_y_s_t_e_m___p_r_o_c_e_s_s___i_n_f_o_r_m_a_t_i_o_n.html

This can be read through NtQuerySystemInformation: https://msdn.microsoft.com/en-us/library/windows/desktop/ms724509%28v=vs.85%29.aspx

mwinterb commented 9 years ago

Possibly a dumb question, but is there any reason to prefer NtQuerySystemInformation over GetProcessMemoryInfo? Is it because you're expecting the WS monitored process list to be large and NtQuerySystemInformation ends up being cheaper? Or is it that the numbers returned are not actually the same in all cases? I guess with chrome.exe, the process list could indeed be quite large.

From a quick test on my Win7 machine, users in the administrators group cannot access csrss.exe processes PROCESS_QUERY_LIMITED_INFORMATION, which NtQuerySystemInformation ignores, so there's that advantage. But enabling SeDebugPrivilege for the UIforETW process token allows access to those processes, too.

There's possibly lots of other exciting and useful info that is only available through NTQSI, I just don't know if this is one of them.

randomascii commented 9 years ago

Whatever works I will be happy with. Getting pagefile usage information is a bit of a black art it seems so it is quite likely that there are better ways. https://codereview.chromium.org/1181263005 shows some work I did in Chromium to get private working set data more efficiently than my normal methods and that took much crazy research and spelunking.

mwinterb commented 9 years ago

Sorry for the delay, my weekends and nights suddenly got busy.

Looking briefly at that Chromium issue, I guess it depends on what data you actually want to record. From the struct definition from ProcessHacker, I believe the only memory information that is not available from a fully documented, single API call are VirtualSize, PeakVirtualSize, WorkingSetPrivateSize, and HardFaultCount. The VirtualSize ones aren't necessarily interesting for the processes in this list, and HardFaultCount is probably redundant with other ETW data.

Since SYSTEM_PROCESS_INFORMATION::WorkingSetPrivateSize exists, "Private WS" could possibly be moved out of just being reported with expensive monitoring. (In a quick test, it seemed like SYSTEM_PROCESS_INFORMATION lined up with both UIforETW and PDH's "Working Set - Private" counter were reporting for that number, but VMMap reported a slightly larger number.)

Also, I should have looked at the source code for SampleWorkingSets earlier since you obviously already know about GetProcessMemoryInfo. Woops.

Since NTQSI(SystemProcessInformation) returns similar data as the toolhelp snapshot, my initial step would be to write a class that internally switches between the two methods of enumerating the processes so that UIforETW "gracefully degrades" if it is running against an unknown ntdll, or do you think that is overkill? I'd probably still write some helpers to manage the iteration since internally dealing with NextEntryOffset's pointer arithmetic and UNICODE_STRING is ugly and "uncommon", but deferring the switching until some later time.

randomascii commented 9 years ago

I'd have to test and see if WorkingSetPrivateSize gives the numbers that I want. It would certainly be useful to be able to get that efficiently. HardFaultCount would also be useful if it was a process-lifetime total. And PrivatePageCount could be useful, except that I don't know what it measures.

PSS is not available of course.

For gracefully degrading keep in mind that ETW only usefully works on Windows 7 and above.

I would potentially be interested in ways of getting more memory information, cheaper. In particular, getting the amount of private data for a process that has been paged out would be pretty sweet. I won't have time to do any work on this for a while, but I'm generally willing to accept pull requests (as long as the CLA is signed - see the CONTRIBUTING file).

mwinterb commented 9 years ago

HardFaultCount appears to be process lifetime, or if it is not, it has a weird time that it resets back to zero. PrivatePageCount appears to be mislabeled as it exactly matches PrivateUsage from PROCESS_MEMORY_COUNTERS_EX (which is documented as being the same as PageFileUsage). This comparison is on Win10 64bit.

On gracefully degrading, I'm more worried about structure definition changes in future operating systems / service packs. Maybe it doesn't actually matter, since the user community is likely small, so it can be broken for a short period of time.

Agreed that PSS would still have to be expensive-WS only. When I get some spare time, I'm hoping to write something that displays the different numbers for working-set related items. It won't be pretty, but hopefully it will be useful. I'll leave a note here if it's ready before you've done your tests.

randomascii commented 9 years ago

Sounds good. And, just to make it perfectly clear, it is normal and expected to add to or modify the working set ETW events in order to present the information ideally. I should probably have added a second working set event for when the expensive data is not available rather than calling the old one with zeroes - maximum clarity to analysts is the goal.

mwinterb commented 9 years ago

Okay, good. Also, how nitpicky do you want to be about the definition of "working set"? If large page and AWE allocation information were added, should that information be recorded in a new ETW working set event? Or should it be something entirely separate from "working set"?

randomascii commented 9 years ago

To avoid confusion it is probably best to stick to the standard definition of working set, and use extra columns for large page and AWE allocations. I thought that AWE was pretty rare now that 64-bit is widely available. Accounting for large pages somewhere seems like a good idea - I didn't realize that they don't get counted as part of the regular working set.

The extra AWE and large-data columns could either be always present or a different event could be used when needed - whatever shows up in the UIforETW default Generic Events table the best.

MagicAndre1981 commented 8 years ago

The ETW provider Microsoft-Windows-Kernel-Memory has a keyword KERNEL_MEM_KEYWORD_WS_SWAP ("0x80"). Here there are some events that occur when data are paged out/paged in:

<template tid="WorkingSetOutSwapStartArgs">
  <data name="ProcessId" inType="win:UInt32"/>
 </template>
 <template tid="WorkingSetOutSwapStopArgs">
  <data name="ProcessId" inType="win:UInt32"/>
  <data name="Status" inType="win:HexInt32"/>
  <data name="PagesProcessed" inType="win:UInt32"/>
 </template>
 <template tid="WorkingSetInSwapStopArgs">
  <data name="ProcessId" inType="win:UInt32"/>
  <data name="Status" inType="win:HexInt32"/>
 </template>
 <template tid="WorkingSetOutSwapStartArgs_V1">
  <data name="ProcessId" inType="win:UInt32"/>
  <data name="Flags" inType="win:HexInt32"/>
 </template>
 <template tid="WorkingSetOutSwapStopArgs_V1">
  <data name="ProcessId" inType="win:UInt32"/>
  <data name="Status" inType="win:HexInt32"/>
  <data name="PagesProcessed" inType="win:Pointer"/>
  <data name="WriteCombinePagesProcessed" inType="win:Pointer"/>
  <data name="UncachedPagesProcessed" inType="win:Pointer"/>
  <data name="CleanPagesProcessed" inType="win:Pointer"/>
 </template>
randomascii commented 8 years ago

Interesting. The page faults graph already shows data being paged in, but data being paged out would be a bit interesting.

However what I really want is to know how much data is currently paged out. Knowing how much paged out data (in pagefile.sys mostly, or maybe by file) there is per process, sampled occasionally, would be awesome. The paging out may have happened before the trace was recorded so I'm not sure that recording data about data being paged out during the trace is particularly meaningful.