microsoft / CsWinRT

C# language projection for the Windows Runtime
MIT License
554 stars 105 forks source link

Random `System.InvalidCastException` thrown when using `Windows.Graphics.Imaging.BitmapDecoder` and `OcrEngine` #762

Closed Nukepayload2 closed 3 years ago

Nukepayload2 commented 3 years ago

Describe the bug System.Private.CoreLib.dll throws System.InvalidCastException randomly when using WinRT API Windows.Graphics.Imaging.BitmapDecoder and Windows.Media.Ocr.OcrEngine. In average, 90% of the call will success. 10% calls to BitmapDecoder.CreateAsync(IRandomAccessStream) or BitmapDecoder.GetSoftwareBitmapAsync or OcrEngine.RecognizeAsync throws exception.

Typical exception message:

Exception thrown: 'System.InvalidCastException' in System.Private.CoreLib.dll
Unable to cast object of type 'Windows.Foundation.AsyncOperationWithProgressCompletedHandler`2[Windows.Storage.Streams.IBuffer,System.UInt32]' to type 'Windows.Graphics.Imaging.BitmapDecoder'.

To Reproduce

Download the sample project and run. Click the "Test 100 times" button and wait until the sum of three numbers is 100. If the number under the "Information" text block is 100, click the button again. WinRTCallUnstable.zip

When the problem is successfully reproduced, the error count > 0. image

Expected behavior System.InvalidCastException should not be thrown.

Version Info

I don't use CsWinRT nuget directly, but my application call WinRT APIs through WinRT.Runtime.dll which is generated by CsWinRT. The target framework is net5.0-windows10.0.18362.0 .

.NET SDK (reflecting any global.json): Version: 5.0.200 Commit: 70b3e65d53

Runtime Environment: OS Name: Windows OS Version: 10.0.19042 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\5.0.200\

Host (useful for support): Version: 5.0.3 Commit: c636bbdc8a

Visual Studio 2019 16.9.0

Additional context This problem is probably related to threading or garbage collection. I added try ... catch to retry WinRT API calls in my code as workaround. I don't know whether my workaround is safe, because I'm not sure if this bug will cause memory corruption.

manodasanW commented 3 years ago

Confirmed this repros with the agile reference changes related to this in 1.1.4, need to further investigate.

manodasanW commented 3 years ago

Also to note this only repros when running on a machine / VM with a lower amount of RAM.

manodasanW commented 3 years ago

Investigation seems show that pointer values are being reused (once for async operation, other time for IBuffer) and that is causing an old RCW to be brought back and invalid cast exception because the RCW is for the wrong type.

manodasanW commented 3 years ago

Turns out this issue is not related to running with a lower amount of RAM as I previously indicated. It turns out it was only reproing before on my VM and not my machine due to differences in Windows build numbers. It seems recent insider Windows builds don't hit the issue, but the RTM ones do hit it which my VM was on. At the same time the issue is not related to changes in Windows, but rather a issue between CsWinRT and .NET 5 based on when finalizers run.

For context, CsWinRT registers the RCW object with the .NET ComWrappers API for the respective ptr, but the lifetime of the ptr is managed by an IObjectReference object stored in the RCW. What is happening is that the finalizer on the IObjectReference has ran letting go of all the references for a ptr allowing for the ptr to be reused, but the RCW hasn't been collected neither has the syncblock for the RCW which removes the RCW from the ComWrappers cache. So at this point if the ptr is used for a new object, .NET thinks it is for the same one in its cache and brings it back alive and returns it. But that causes an InvalidCastException because it is not and is for another type.

The real fix after discussion with .NET folks is for .NET to add a new API that allows CsWinRT to remove a registered RCW from the ComWrappers cache when it is no longer alive. This is being looked at for .NET 6 and is tracked by https://github.com/dotnet/runtime/issues/51968

But to address this for current .NET 5 consumers, CsWinRT would need to do a mitigation for the issue. The mitigation being considered is to make the final release on the ptr only occur after the RCW has been finalized and the sync block has been finalized. There is no reliably way of telling when this happens but we can try to achieve that by making the release happen in Gen2 finalization. This seems to be able to be achieved by registering the object for finalization twice.

manodasanW commented 3 years ago

Previous fix was reverted due to it introduced other issues. The new plan to address this issue is a fix in the dotnet runtime (both .NET 5 and .NET 6). See referenced PRs.

manodasanW commented 3 years ago

This is confirmed to be fixed in the upcoming .NET servicing update (5.0.8).