google / UIforETW

User interface for recording and managing ETW traces
https://randomascii.wordpress.com/2015/04/14/uiforetw-windows-performance-made-easier/
Apache License 2.0
1.57k stars 201 forks source link

Support for .NET stacks in traces #51

Closed lcapaldo closed 8 years ago

lcapaldo commented 9 years ago

This change always users to opt in to activating providers that allow .NET methods to appear in callstacks.

Closes #48.

randomascii commented 9 years ago

The new .NET stacks button should be disabled when tracing is on, since changing it then has no effect. Same as the Context switch and CPU sampling call stacks buttons. Oh wait - I see the comment regarding this. Hmmm...

Also, it should probably be labeled .Net symbols - the call stacks are always there, just not much use without symbol resolution.

More importantly, it didn't seem to work for me. I profiled WPA with it checked and I saw the new provider being used, but WindowsBase.dll (for instance) didn't show symbols.

It would be good to have more comments explaining how it is supposed to work. That might help me figure out why it isn't working for me. Where is the symbol information supposed to be stored?

The Sleep(15000) doesn't inspire confidence...

After the Sleep(15000) should the code we waiting for xperf.exe to finish? This can be done by adding another scope, or by calling GetExitCode(). Right now it feels like a race condition because the two process might overlap and I'm not sure that is good. Maybe waiting on the process can avoid the Sleep(15000)?

lcapaldo commented 9 years ago

Oops, I assumed ChildProcess::Run was blocking.

I'll try and get back to you in detail after the weekend.

randomascii commented 9 years ago

I got this set of errors this morning. Both errors seem to be because the clrrundown.etl file is locked by the System process.

xperf.exe -start UIforETWSessionCLRRundown -on A669021C-C450-4609-A035-5AF59AF4DF18:0xB8:0x5 -f "C:\Users\Bruced\AppData\Local\Temp\clrrundown.etl" xperf.exe -stop UIforETWSessionCLRRundown xperf: error: UIforETWSessionCLRRundown: Cannot create a file when that file already exists. (0xb7). Process exit code was 800700b7 (2147942583)

xperf.exe -merge "C:\Users\Bruced\AppData\Local\Temp\kernel.etl" "C:\Users\Bruced\AppData\Local\Temp\user.etl" "C:\Users\Bruced\AppData\Local\Temp\clrrundown.etl" "c:\Users\Bruced\Documents\traces\2015-08-21_07-59-48_Bruced.etl" xperf: error: c:\Users\Bruced\Documents\traces\2015-08-21_07-59-48_Bruced.etl: The specified path is invalid. (0xa1). Process exit code was 800700a1 (2147942561)

Here's what handle.exe has to say:

Handle v4.0 Copyright (C) 1997-2014 Mark Russinovich Sysinternals - www.sysinternals.com

System pid: 4 type: File 3288: C:\Users\Bruced\AppData\Local\Temp\clrrundown.etl

randomascii commented 9 years ago

Run() doesn't block, but GetExitCode(), GetOutput(), and the destructor all block on waiting for the child process to exit.

Occasionally it can be handy to have Run() return immediately but I'm not sure if UIforETW ever actually makes use of that.

randomascii commented 9 years ago

Your change should also add your name to the AUTHORS and CONTRIBUTORS file. And, when it's all reviewed and finished I'll want it updated-to-latest and squashed so that it merges in cleanly.

lcapaldo commented 9 years ago

Why I called it "stacks" instead of "symbols"

Truthfully its not symbols, but rather "method load"/"module load" information, which is required to ultimately perform the symbol resolution. Granted, "stacks" is, as you say, also the wrong thing to call this, but it was definitely not PDBs, which is why I hesitated to call it symbols.

Sleeping for 15seconds.

This is to collect the information for methods that were loaded prior to the CLR runtime provider being turned on (iow we're starting to trace an existing process). The rundown provider basically has the .NET processes dump their module/method/whatever state of the world into the trace.

What we are supposed to do here is to wait for xperf to finish, ie turn on this provider, and then wait for some period of time while the rundown events are collected. Sleeping was based on " You may need to wait 30 to 60 seconds for all events to be collected.", and I arbitrarily decided that I would start with 15 seconds.

Probably the actually correct thing to do is to open the trace as it is being recorded and wait for the DCEndComplete events to show up and then stop recording. I assume this is what WPR and PerfView do.

I meant to wait for xperf to finish, but didn't realize, and didn't bother to check (my mistake), that the wait for the process happened in the dtor/GetExitCode.

That folder of PDBs you get from WPR

These are specifically PDBs for native (NGEN) images. ModuleLoad_V2 etc. events will contain the native image paths (if there are any), which you can then use ngen createPDB to create the associated PDBs for the ngenned images. "Normal" IL images will use the MethodLoad events, as well as managed PDBs.

Why didn't it work for you?

Not sure. The most recent time I tested this out I profiled VS2015 with it. I will try profiling WPA with it as well.

When will I get to these changes?

Maybe this week, but probably not until some time next week. I picked a busy time to scratch this itch :/. Let me know if you have other questions.

randomascii commented 9 years ago

Thanks for the update. I suspect that by "You may need to wait 30 to 60 seconds for all events to be collected." they mean "the process may take that long, so be patient if xperf takes a while to finish running". I highly doubt they meant "wait 30 to 60 seconds after xperf exits" both because that would be terrible software engineering, and because I assume that xperf is coordinating the recording of DCEnd events. So, waiting for xperf to exit will probably solve all problems.

Probably the actually correct thing to do is to open the trace as it is being recorded and wait for the DCEndComplete events to show up

I doubt you can do that. The file should be locked while they write to it.

Maybe it will work for me after that change. I could even try it.

Just to set expectations for me, should this code allow resolution of all .Net symbols, including those in programs that were running before I started tracing?

which is why I hesitated to call it symbols

I definitely don't want it called stacks because that will cause confusion. From a user's point of view it does enable .Net symbols, and the behind-the-scenes details aren't crucial, so I'm still included to that name, but there may well be a better choice. Note that a well written tooltip can fill in the details nicely.

lcapaldo commented 9 years ago

I definitely don't want it called stacks because that will cause confusion. From a user's point of view it does enable .Net symbols, and the behind-the-scenes details aren't crucial, so I'm still included to that name, but there may well be a better choice.

Yeah I was thinking maybe ".NET method info"?, regardless something other than "stacks" is probably best.

Just to set expectations for me, should this code allow resolution of all .Net symbols, including those in programs that were running before I started tracing?

If the rundown stuff works correctly, yes*, that is the intent.

* You may need to borrow some PDBs from a WPR run or generate them for the NGENed assemblies, I didn't seem to need to typically, but I probably also have a bunch of old .symcaches for .net stuff laying around too.

lcapaldo commented 9 years ago

Waiting for the process to exit and removing the sleep I got the same error wrt the file existing at least once. Bad engineering or no, I don't think xperf has special knowledge of when the rundown is complete, and I've never seen a provider that keeps (the started instance of) xperf running after the recording has been started (that doesn't mean they don't exist of course).

I pushed a change where we wait for the process to exit, but don't sleep (I won't leave the final version like this with the if(false) of course) so you can try it out if you get a chance.

randomascii commented 9 years ago

Yeah, I guess you're right (about xperf terminating earlier). Maybe we need to keep polling until we can open the file for read? What a mess.

lcapaldo commented 9 years ago

PerfView apparently polls the file size:

Rundown File Length: 67.7MB delta: 67.7MB
Rundown File Length: 77.7MB delta: 10.0MB
Rundown File Length: 77.7MB delta: 0.0MB
Rundown file has stopped growing, assuming rundown complete.

I don't think polling for "write" read will work because the file will not be writable readable until we -stop, and whenever we -stop it will be writable readable of course.

EDIT: Brain fart, readable for writable.

randomascii commented 9 years ago

Oh god - polling until the file size stops increasing? That's horrible. That is, it is horrible that that may be the required solution.

lcapaldo commented 9 years ago

Just an update, I've been trying to implement the file sizing polling, but it is finishing very quickly in my tests which is making me suspicious.

randomascii commented 9 years ago

How frequently are you polling? Presumably you need to figure out the typical and worst-case granularity so you can be sure that you know when you can be certain that it has stopped growing.

Apologies for the conflicts - be sure to pull/squash when you're done.

lcapaldo commented 9 years ago

I'm still alive. I'm hopeful I can get a chance to wrap this up this weekend.

lcapaldo commented 9 years ago

Didn't get to work on this as much as I wanted but I think I understand why the rundown stuff wasn't working properly, and should also be able to implement the WPR style "collect PDBs for ngenned images as well". I'm going to push to my branch some of this stuff, don't worry about it for now, I'll ping this again when I have a clean, rebased commit.

randomascii commented 9 years ago

No hurry, but I am looking forward to having it.

ariccio commented 9 years ago

Also note:

This branch has conflicts that must be resolved
lcapaldo commented 8 years ago

Closing this because there's something in there now. If I get around to doing collecting ngen pdbs/rundown stuff before anyone else I'll open a new one.