microsoft / perfview

PerfView is a CPU and memory performance-analysis tool
http://channel9.msdn.com/Series/PerfView-Tutorial
MIT License
4.06k stars 695 forks source link

I got a lot of .etl.zip files that have zeroes in CPU Msec for every process. CPU Stack is empty, but Thread Time Stack show correct information #1726

Closed vladimir-cheverdyuk-altium closed 1 year ago

vladimir-cheverdyuk-altium commented 1 year ago

I got a quite a few of PerfView files that have zeroes in CPU Msec column for every process. It looks like this: image

If I choose process in CPU Stacks and open, it will not work. Every tab contains only ROOT. But if I choose the same process in Thread Time Stacks and open, it will correctly show everything.

For example if I choose first process svchost I will get this CPU Stacks: image

And if I open it in Thread Time Stacks I will see this: image

Is it correct?

I can provide .etl.zip file(s) directly to developer because it may contain some private information.

Vlad

ToyDragon commented 1 year ago

It sounds like a lot of things can cause this, but I was able to fix it by disabling real time virus protection in windows. Settings > Windows Security > Virus and Thread Protection / Manage > Real-time protection

vladimir-cheverdyuk-altium commented 1 year ago

But it actually works fine in Thread Time Stack. It is hard for me to imagine that AV can damage stacks for CPU but leave them be for Thread Time Stack. I also have suspicion that there are only one stack for both of them and of them just has extra information.

brianrob commented 1 year ago

The reason that you see the thread time stacks view working but the CPU stacks not is because thread time depends upon two sets of events - CPU sampling and contextswitch/readythread events. The contextswitch/readythread events are emitted when you hit these paths in the kernel, whereas the CPU samples are emitted via profiling interrupts. It's the profiling interrupts that tend to get broken by kernel drivers, etc.

Are these on retail builds of the OS, or are these Windows Insider builds?

vladimir-cheverdyuk-altium commented 1 year ago

From TraceInfo file I can see this: OS Build Number 19041.2006.amd64fre.vb_release.191206-1406

Another PC: 22000.1.arm64fre.co_release.210604-1628

brianrob commented 1 year ago

I think these are both release builds. I think we've looked at these together before. I suspect that the best course of action for these is to file a Windows Feedback ticket. This should help to get some eyes with operating system expertise to help investigate. You can do this via the key combination Windows-F.

vladimir-cheverdyuk-altium commented 1 year ago

Ok, I will leave feedback. Could you please tell me what kind terms should I use for this? I don't know many technical details and I would like to get their attention.

And just in case these are different reports from different companies.

brianrob commented 1 year ago

Sure. I think the key here is that you want to point out that you have a number of traces that should be capturing CPU sampling ETW events, but those events aren't being captured. When you submit the feedback ticket, ideally submit it from a machine where this happened, so that the diagnostic information that is captured is relevant. If this is not possible, make sure to provide Windows version information. That's probably as much as you need to do until/unless they ask for more. Please also provide a link to the feedback here.

brianrob commented 1 year ago

I just learned of a feature in Microsoft Defender that may cause the behavior you're seeing, as it takes the PMU from ETW. You can read more about it at https://www.microsoft.com/security/blog/2021/04/26/defending-against-cryptojacking-with-microsoft-defender-for-endpoint-and-intel-tdt/ and https://techcommunity.microsoft.com/t5/microsoft-defender-for-endpoint/defending-against-ransomware-with-microsoft-defender-for/ba-p/3243941.

The feature can be disabled by running powershell.exe Set-MpPreference -DisableTDTFeature $true.

@rbanks54, also adding you here, as this might also be what you're seeing in #1723.

rbanks54 commented 1 year ago

That's an interesting feature! Sounds like the ML model might need some tweaking if it's the cause 🙂

Thankfully,l I could use the work around for the conference talk I gave yesterday. Phew!

I'll give it a try on the two machines with problems later today and let you know the results

rbanks54 commented 1 year ago

@brianrob Success!!

The setting stays after a reboot as well, though trying to re-enable it throws an error

> set-mppreference -DisableTDTFeature $false Set-MpPreference: Operation failed with the following error: 0x80004005. Operation: Set-MpPreference. Target: DisableTDTFeature.

Thank you so much for chasing things up!

vladimir-cheverdyuk-altium commented 1 year ago

Thank you @brianrob and @rbanks54. I will add this to list of instructions and hopefully it will fix that issue.

brianrob commented 1 year ago

Glad to hear that we're potentially making some progress in this area. Weird that you can't re-enable it though. Definitely worth calling out to folks in any instructions. I'm going to close this issue for now, but @vladimir-cheverdyuk-altium, let me know if we need to re-open it.

vladimir-cheverdyuk-altium commented 1 year ago

@brianrob that workaround helped. Today I had a case when customer sent PerfView file with 0 CPU everywhere. I asked customer to run command above and re-run PerfView and it helped. New file has proper CPU time everywhere.

Thank you again.

brianrob commented 1 year ago

Awesome @vladimir-cheverdyuk-altium! That's great to hear. Thanks for letting me know.

evgn commented 1 year ago

Hi folks!

If you're still interested, I did some research of this problem and left a comment here.