microsoft / perfview

PerfView is a CPU and memory performance-analysis tool
http://channel9.msdn.com/Series/PerfView-Tutorial
MIT License
4.14k stars 707 forks source link

First PerfView collect hang entire pc for several minutes #2019

Closed kirsan31 closed 5 months ago

kirsan31 commented 5 months ago

We connect to workstation via rdp. After first call to PerfView collect rdp connect will immediately broke. And we can't connect to server during several minutes. Our app log shows that all other internet services also immediately loss connection. And judging by the very rare messages in the log, it seems that the entire PC is hanging, like all processor cores are loaded at 100%? In PerfView log we see:

PerfView logging started at 06.04.2024 11:55:59 ...................... [Starting collection at 06.04.2024 11:59:35]

Exactly during this time whole pc is hang. On all other PCs we see no time difference between this lines.

Any thoughts?

Microsoft Windows 10 Pro for Workstations (Microsoft Windows 10.0.19045) PerfView 3.1.9.0 The app is WinForms .Net7 Log: perfViewRun.log

cincuranet commented 5 months ago

By immediately you mean really immediately? Like there's no delay before the connection timeouts or something like that?

Can you try (if possible) same steps without RDP (physically on machine), just to see whether the issue persists or not?

Also it would, maybe, help if you could pinpoint what first really means - exactly as you described i.e. only after clean boot or first run of PerfView or ...

Is the application you're profiling somewhat special? Anything out of regular WinForms app (i.e. talks to some HW device)?

kirsan31 commented 5 months ago

@cincuranet

By immediately you mean really immediately? Like there's no delay before the connection timeouts or something like that?

As soon as I press collect button in our app, rdp connection screen become static - no changes and after some seconds it's report about broke connection. But I can say that problems start after PerfView started and before very first message in log (PerfView logging started at 06.04.2024 11:55:59). We have message in our log right after PerfView process was started. And when all run normally our log message and PerfView first log message must be in the same time (same second). But in this situation our message was in 11:55:57.2229 and PerfView in 11:55:59 (about 2 sec difference), so everything has already started to slow down.

Can you try (if possible) same steps without RDP (physically on machine), just to see whether the issue persists or not?

Unfortunately no, we have no physical assess to server. But I was trying to repro with rdp to other pc - without success :( But I think that this is not rdp problem, because all other connection broke too (not only inside our app, in other apps too) and all other activity slow down too.

Also it would, maybe, help if you could pinpoint what first really means - exactly as you described i.e. only after clean boot or first run of PerfView or ...

Yes this what I defiantly will do, but the problem that I can experiment only not at work time :(

Is the application you're profiling somewhat special? Anything out of regular WinForms app (i.e. talks to some HW device)?

No - no special HW communication.

cincuranet commented 5 months ago

As soon as I press collect button in our app, rdp connection screen become static - no changes and after some seconds it's report about broke connection.

Can you try starting PerfView manually (i.e. double clicking from Explorer) and collecting some general trace (does (should) not have to be specific to your app) for a moment? Just to see whether it reproduces this way too.

kirsan31 commented 5 months ago

Can you try starting PerfView manually (i.e. double clicking from Explorer) and collecting some general trace (does (should) not have to be specific to your app) for a moment? Just to see whether it reproduces this way too.

Will try...

brianrob commented 5 months ago

Can you also please share the command line that you're using to capture? Or if you're using the GUI, what is the configuration that you're using?

kirsan31 commented 5 months ago

Can you also please share the command line that you're using to capture?

PerfView /DataFile:d:\MMExtData\PerfView\4.272.8844.22576_2024.04.06.11.55.57.etl /Merge:true /zip:true /BufferSizeMB:256 /StackCompression /CircularMB:500 /MaxCollectSec:20 /KernelEvents:ThreadTime /TplEvents:Default /FocusProcess:46620 /logFile=d:\MMExtData\PerfView\perfViewRun.log /AcceptEula collect

It is also in the provided log file (attached at first post).

brianrob commented 5 months ago

Thanks. I have seen this sort of thing caused by the TPLEvents. No guarantee that's what is happening here, but can you try with /TPLEvents:None and see if that fixes it. It's possible that the events are so verbose that you're saturating the CPU with them. Also worth trying the steps that @cincuranet suggests as well.

kirsan31 commented 5 months ago

Thanks. I have seen this sort of thing caused by the TPLEvents. No guarantee that's what is happening here, but can you try with /TPLEvents:None and see if that fixes it. It's possible that the events are so verbose that you're saturating the CPU with them.

Thanks, I'll try. But what confuses me is that the problem appears before the collection starting - how TPLEvents can affect then?

brianrob commented 5 months ago

Thanks. I have seen this sort of thing caused by the TPLEvents. No guarantee that's what is happening here, but can you try with /TPLEvents:None and see if that fixes it. It's possible that the events are so verbose that you're saturating the CPU with them.

Thanks, I'll try. But what confuses me is that the problem appears before the collection starting - how TPLEvents can affect then?

I see. I think I misunderstood then. I thought it triggered when you hit the "Start Collection" button. If it actually occurs when you launch PerfView through your app, then it's not the same issue I was thinking of, and following @cincuranet's recommendation sounds right to me.

kirsan31 commented 5 months ago

Was experimenting today. I tried manual launch and from our application (launching new processes), tried deleting the %APPDATA%\Roaming\PerfView directory (simulating new version)... Without results - it is not possible to reproduce - everything works normally. My thots:

I will update as soon as can test after server reboot.

kirsan31 commented 5 months ago

Was trying after server restart - not repro :( So, I will close it until get some more info...