google / UIforETW

User interface for recording and managing ETW traces
https://randomascii.wordpress.com/2015/04/14/uiforetw-windows-performance-made-easier/
Apache License 2.0
1.55k stars 201 forks source link

Error starting a trace on Windows 7 SP1 x64 #121

Closed rovarma closed 6 years ago

rovarma commented 6 years ago

I'm trying to start a trace using UIforETW on Windows 7 Ultimate N SP1 64-bit. However, tracing fails with an 'Invalid flags' error. This is using the latest UIforETW release (1.49).

This is probably not an UIforETW issue (running the same command in a batch file also gives the same error), but as @randomascii seems very knowledgeable about ETW, I figured I'd check here to see if anybody has any ideas about what could cause this.

If this is not the correct forum for this, please feel free to close.

Starting tracing to in-memory circular buffers...
xperf.exe -start "NT Kernel Logger" -on Latency+POWER+DISPATCHER+DISK_IO_INIT+FILE_IO+FILE_IO_INIT+VIRT_ALLOC+MEMINFO -stackwalk Profile+CSwitch+ReadyThread -buffersize 1024 -minbuffers 300 -maxbuffers 300 -buffering -start UIforETWSession -on Microsoft-Windows-Win32k:0xfdffffffefffffff+Multi-MAIN+Multi-FrameRate+Multi-Input+Multi-Worker+Microsoft-Windows-Kernel-Memory:0xE0+Microsoft-Windows-Kernel-Power -buffersize 1024 -minbuffers 50 -maxbuffers 50 -buffering
xperf: error: UIforETWSession: Invalid flags. (0x3ec).
Error starting tracing. Try stopping tracing and then starting it again?
Process exit code was 800703ec (2147943404)
randomascii commented 6 years ago

It probably is a UIforETW issue. The "Invalid flags" message means that one of the requested providers is not supported on your OS. That is, one of these:

Latency+POWER+DISPATCHER+DISK_IO_INIT+FILE_IO+FILE_IO_INIT+VIRT_ALLOC+MEMINFO

or one of these:

Microsoft-Windows-Win32k:0xfdffffffefffffff+Multi-MAIN+Multi-FrameRate+Multi-Input+Multi-Worker+Microsoft-Windows-Kernel-Memory:0xE0+Microsoft-Windows-Kernel-Power

The "+" signs are the separators and with a batch file you should be able to find the problematic one easily. Once the trace is started you'll need to force-stop it somehow.

I suspect that Microsoft-Windows-Kernel-Memory:0xE0 or Microsoft-Windows-Kernel-Power is the problem.

There is already logic in there to customize which providers are requested. Either that logic doesn't handle your particular OS version or I added a new flag without realizing I needed to add a check.

rovarma commented 6 years ago

Alright, I tried all of them. The kernel session starts fine, all of those providers work. Unfortunately, it seems that none of the user session's providers work, which seems very suspicious. All of them result in the same Invalid Flags error.

I'll continue to investigate a bit more, but it seems very weird.

rovarma commented 6 years ago

As a random guess, I tried using the provider GUID instead of its name (i.e. 231CF54B-22A0-49E4-A59A-47052A30FFED instead of Multi-Main) when starting the trace and that works fine. So it seems that I'm not able to start traces using the registered provider names on this machine, for some reason.

randomascii commented 6 years ago

GUIDs always work, even when the providers are not available. So, I think that means that none of the user-mode providers are available at all.

If Multi-Main isn't available then that suggests some weird and fundamental problem. That provider (all of the Multi-* ones) actually ships with UIforETW so it can't not be available. And yet...

To be clear, the Multi-* providers are registered when you run UIforETW. Maybe that code is broken somehow? Or failing somehow? If you check "Show commands" and then click Start Tracing then the wevtutil commands that unregister/register the providers will be printed. I think they must be failing, and it looks like UIforETW doesn't detect or report such failures. So that is probably the problem.

randomascii commented 6 years ago

Oh yes, the code in question is CUIforETWDlg::RegisterProviders()

rovarma commented 6 years ago

The registration appears to work fine.

I double-checked that the wevtutil registration succeeds: the exit code is sane and no output is logged. In addition, if I run xperf -providers, I can find the Multi-Main provider in the list of providers, under 'Registered User Mode Providers'. However, it's not in that list by name, instead it's listed by its GUID. If I look at the same list on a (non-Windows 7) machine where tracing does work, they're in that list by name. Starting a trace with Multi-Input enabled (through its GUID) also works in that the input events show up in the Generic Events pane in WPA.

So it seems to me that the problem is purely with the 'name' part of the provider, something going wrong (or just working differently on Windows 7?) when registering the provider name, perhaps. No idea what might cause it though...

randomascii commented 6 years ago

That is weird. If you use the GUIDs instead of the names do you actually record events from those providers? i.e.; do keyboard and mouse events show up?

I'm not sure what the best fix would be here. The GUIDs are ugly and less understandable so I don't want to use them generally, but we could just use them when necessary - whatever that means.

rovarma commented 6 years ago

The events did show up, but I stumbled upon the root cause of the issue and it's indeed unrelated to UIforETW.

As part of my investigation into what's going on on this machine (separately from xperf issues), I had to run some WMI queries through powershell. I was unable to run those queries, because they all kept failing with ERROR_IO_DEVICE. I'd noticed some other sporadic flakiness related to the disk drives before, so I ran chkdsk on the problematic drive. It found some errors which it repaired, after which I was able to run WMI queries again.

When that worked, I tried to start tracing with UIforETW again, and that also worked again. xperf -providers now also lists the user mode providers by name again. So, my guess is that wevtutil stores the GUID <-> name mapping somewhere in WMI and the disk corruption was preventing xperf from connecting to WMI, thus rendering it unable to determine the GUID from the name.

It turns out that being unable to connect to WMI was also what was causing the performance issues in our application that I was seeing, so I don't actually need to capture any traces on this machine to investigate that anymore...

Either way, problem solved and it was not related to UIforETW. Sorry for wasting your time!

randomascii commented 6 years ago

Thanks for sharing the explanation.