google / UIforETW

User interface for recording and managing ETW traces
https://randomascii.wordpress.com/2015/04/14/uiforetw-windows-performance-made-easier/
Apache License 2.0
1.55k stars 201 forks source link

Broken File I/O recording? #145

Closed milianw closed 3 years ago

milianw commented 3 years ago

Hey all,

I hope this is the right place for this report. I'm having trouble with the File I/O tracing capabilities of xperf/WPA. It seems like WPA fails to map most file handles an actual file name. Instead I mostly see something like this:

Unknown (0xffffad80da5287c0)

For Event SubType Create I do see the file path, but for anything else - most notably Read/Write, I do not see the path anymore. On the other hand, a tool like Resource Monitor doesn't seem to have a problem with showing me the file that gets read/written.

Furthermore, the Stack column always shows n/a for me. Here's a screenshot that shows both issues:

brokenio

Now, I'm unsure if this is a bug/limitation in xperf, WPA or UIforETW. Is maybe some specific xperf knob missing when using UIforETW that enables stack and file mapping recording?

I hope someone can help me out. I'm testing the above with the following setup:

Windows 10 Pro 10.0.18362 UIforETW 1.54 WPA 10.0.19041.1 xperf 6.3.9600

fsfod commented 3 years ago

It might be the problem described in this blog post https://lowleveldesign.org/2020/08/15/fixing-empty-paths-in-fileio-events-etw/ where the trace is missing FileRundown events to map file objects to file paths.

randomascii commented 3 years ago

Are you tracing to a circular buffer or to file? Unfortunately there is a bug in Windows that causes thousands of lost events on some machines (including my work machines) when doing circular buffer tracing if DISK_IO tracing is enabled. Therefore (in change 0176654fac5) I disabled DISK_IO tracing and that is probably what is causing the data to be missing.

You could manually add DISK_IO back to the list of options (the settings dialog has an option for that - "Extra kernel flags") or you could use tracing to file. Or, maybe DISK_IO should only be disabled on the type of machines that have problems (unfortunately I don't know what the trigger is).

For stacks it is expected that they are not recorded by default. Recording stacks is expensive and they have to be explicitly enabled. xperf -help stackwalk gives a list of the stack walking options - I don't know of where they are better documented unfortunately. The settings dialog has an option for that - "Extra kernel stackwalks".

I don't think that the missing rundown events are a problem. I'm using xperf to trigger tracing and it should do a proper rundown. I hope this all helps, and let me know if you have thoughts on how to make this work better.

milianw commented 3 years ago

Thank you @randomascii, it was the circular buffer that introduces the issue. Switching to file tracing solves it. I also tried adding DISK_IO manually, that also fixes it for me. Where can I see the missed events to check if that is an issue I'm running into too on my machine?

As a suggestion: could you add a warning or similar output to UIforETW that DISK_IO tracing is disabled when using circular buffer tracing? Otherwise it's not very intuitive for novices to figure out what the issue is.

Thanks also for the suggestion on how to get I/O stackwalks.

Cheers

PS: I'm closing this already, as the main issue is solved.

milianw commented 3 years ago

Hmm and one thing I just stumbled over:

https://docs.microsoft.com/en-us/windows-hardware/test/wpt/using-xperf-profiles

That page rightly points out that one should rather not trace to file when doing disk I/O tracing, as then the trace-file-writing would interfere with the actual process that one wants to trace (unless one writes to a separate disk...)

randomascii commented 3 years ago

I like the idea of mentioning that DISK_IO is disabled. I'll add such a message.

I would interpret the advice on that page (from the last comment) as considerations about possible sources of confusion and interference. It's not a "should not" but "consider...". Yes, tracing to disk uses the disk, but I have always found it easy to ignore that.

randomascii commented 3 years ago

"Where can I see the missed events to check if that is an issue I'm running into too on my machine?"

WPA complains about dropped events when you load a trace. You can't miss the warning if you load a trace into WPA. If you haven't seen the warning then you aren't affected by the issue.

milianw commented 3 years ago

I see, that indeed means my machine isn't affected. Anyhow, with your new message I hope future users should be able to easily find the workaround.

Thanks a lot for your help!