microsoft / microsoft-ui-xaml

Windows UI Library: the latest Windows 10 native controls and Fluent styles for your applications
MIT License
6.34k stars 677 forks source link

AddMemoryPressure causes Garbage Collection to run for over a minute #8933

Closed aleibhardt closed 1 year ago

aleibhardt commented 1 year ago

Describe the bug

I have an intermittent bug in my application that is causing our UWP app to hang for large periods of time. The bug is currently blocking us from our next release.

The issue is that occasionally when we unload one of our UserControls, the entire application freezes. The issue reproduces sporadically.

When the issue reproduces I cannot pause inside visual studio debugger, as the debugger hangs and I have to kill the app. I have been able to pause inside of WinDBG however.

From the callstack inside windbg, it looks like the xaml engine gets stuck trying to run a GC collection pass. The problematic part of the call stack starts with a call to GC.AddMemoryPressure().

76 000000be`9bffddc0 00007ff8`5e697ba4     SharedLibrary!System::GC.AddMemoryPressure+0x18e [f:\dd\ndp\fxcore\CoreRT\src\System.Private.CoreLib\src\System\GC.cs @ 572] 

When the app starts reproducing this issue, the CPU starts spiking, and the memory consumption of the application starts to go up until the framework figures itself out. This can take upwards of a minute, and I have seen my memory consumption go from 600MB to 1.2GB in this time.

I have spent some time profiling this issue under both visual studio profiler and jet brains dot memory profiler. The memory that is increasing in this time is all within the native/unmanaged space and I believe this is completely within the xaml framework increasing the memory.

I have been able to put breakpoints through windbg on the xaml parts of the call stack, and these will keep hitting, which told me that parts of xaml are still running here and we are not deadlocked.

While this seems like a bug in the engine, it would be good to know what we have done within our code that is exasperating this issue, and if there is a way that we can work around the issue.

The control that we are unloading is around 4000 elements in the xaml tree. The user control itself we keep in memory to improve performance when the user wants to bring the view back onto screen. The control that we believe is causing the problem is backed by a listview. In my repro the list view contains 6000 items, but virtualization is working, and obviously not all of those items are in the tree.

Steps to reproduce the bug

I currently don't have a consistent way to reproduce this, and this release of the app hasn't made it's way into production yet.

If it's valuable i should be able to collect a time travel trace of the application and attach that.

Otherwise, if we can't get a solution, we may be shipping this feature soon with a feature flag that will turn it off, and I can send instructions to enable the feature flag along with repro steps.

Expected behavior

GC should not be taking upwards of a minute and hanging the app completely.

Screenshots

AddMemoryPressureHangProfileSnapshot I have attached a screenshot of a dot memory profiler trace. The yellow line indicates time spent in GC, usually time in gc is a very small yellow dash, however here the line goes for a minute and a half. Another line shortly after goes for about 15 seconds, which is also unacceptable.

You can see the increase of unmanaged memory when the gc starts happening as well.

AddMemoryPressureCallStack.txt I have attached a copy of the stack track that i broke into while the app was in this state.

NuGet package version

WinUI 2 - Microsoft.UI.Xaml 2.8.2

Windows version

Windows 11 (22H2): Build 22621

Additional context

No response

YourOrdinaryCat commented 1 year ago
76 000000be`9bffddc0 00007ff8`5e697ba4     SharedLibrary!System::GC.AddMemoryPressure+0x18e [f:\dd\ndp\fxcore\CoreRT\src\System.Private.CoreLib\src\System\GC.cs @ 572] 
77 000000be`9bffde20 00007ff8`5e694274     SharedLibrary!$8_System::__ComObject.AddGCMemoryPressure+0x44 [f:\dd\ndp\fxcore\CoreRT\src\System.Private.Interop\src\Shared\__ComObject.cs @ 439] 
78 000000be`9bffde50 00007ff8`5e694220     SharedLibrary!$8_System::__ComObject.__Attach+0x44 [f:\dd\ndp\fxcore\CoreRT\src\System.Private.Interop\src\Shared\__ComObject.cs @ 573] 
79 000000be`9bffdee0 00007ff8`5e6941c5     SharedLibrary!$8_System::__ComObject.__AttachingCtor+0x50 [f:\dd\ndp\fxcore\CoreRT\src\System.Private.Interop\src\Shared\__ComObject.cs @ 406] 
7a 000000be`9bffdf20 00007ff8`5e694181     SharedLibrary!$8_System::__ComObject.AttachingCtor+0x35 [f:\dd\ndp\fxcore\CoreRT\src\System.Private.Interop\src\Shared\__ComObject.cs @ 393] 

https://github.com/dotnet/corert/blob/a8830fe5158c499a75b19239524a6d1092a679fe/src/System.Private.Interop/src/Shared/__ComObject.cs#L558

Following the breadcrumbs from there, I think it has something to do with the GCPressureAttribute. Are you using types with [GCPressure] applied in your controls?

aleibhardt commented 1 year ago

Agreed, the issue seems to coming in from GCPressure, however we don't have this attribute anywhere in our callstack. I'm assuming that some of the xaml controls we are using has this attribute.

a 90 second GC walk seems like something has gone completely out of whack though, and the fact that it doesn't reproduce consistently seems to imply to me that there is some sort of race going on.

ranjeshj commented 1 year ago

closing since we are not planning to do any work in WinUI2. See here for more details - https://github.com/microsoft/microsoft-ui-xaml/blob/main/docs/contribution_handling.md#winui-2