Closed Nukepayload2 closed 3 years ago
Confirmed this repros with the agile reference changes related to this in 1.1.4, need to further investigate.
Also to note this only repros when running on a machine / VM with a lower amount of RAM.
Investigation seems show that pointer values are being reused (once for async operation, other time for IBuffer) and that is causing an old RCW to be brought back and invalid cast exception because the RCW is for the wrong type.
Turns out this issue is not related to running with a lower amount of RAM as I previously indicated. It turns out it was only reproing before on my VM and not my machine due to differences in Windows build numbers. It seems recent insider Windows builds don't hit the issue, but the RTM ones do hit it which my VM was on. At the same time the issue is not related to changes in Windows, but rather a issue between CsWinRT and .NET 5 based on when finalizers run.
For context, CsWinRT registers the RCW object with the .NET ComWrappers API for the respective ptr, but the lifetime of the ptr is managed by an IObjectReference object stored in the RCW. What is happening is that the finalizer on the IObjectReference has ran letting go of all the references for a ptr allowing for the ptr to be reused, but the RCW hasn't been collected neither has the syncblock for the RCW which removes the RCW from the ComWrappers cache. So at this point if the ptr is used for a new object, .NET thinks it is for the same one in its cache and brings it back alive and returns it. But that causes an InvalidCastException because it is not and is for another type.
The real fix after discussion with .NET folks is for .NET to add a new API that allows CsWinRT to remove a registered RCW from the ComWrappers cache when it is no longer alive. This is being looked at for .NET 6 and is tracked by https://github.com/dotnet/runtime/issues/51968
But to address this for current .NET 5 consumers, CsWinRT would need to do a mitigation for the issue. The mitigation being considered is to make the final release on the ptr only occur after the RCW has been finalized and the sync block has been finalized. There is no reliably way of telling when this happens but we can try to achieve that by making the release happen in Gen2 finalization. This seems to be able to be achieved by registering the object for finalization twice.
Previous fix was reverted due to it introduced other issues. The new plan to address this issue is a fix in the dotnet runtime (both .NET 5 and .NET 6). See referenced PRs.
This is confirmed to be fixed in the upcoming .NET servicing update (5.0.8).
Describe the bug
System.Private.CoreLib.dll
throwsSystem.InvalidCastException
randomly when using WinRT APIWindows.Graphics.Imaging.BitmapDecoder
andWindows.Media.Ocr.OcrEngine
. In average, 90% of the call will success. 10% calls toBitmapDecoder.CreateAsync(IRandomAccessStream)
orBitmapDecoder.GetSoftwareBitmapAsync
orOcrEngine.RecognizeAsync
throws exception.Typical exception message:
To Reproduce
Download the sample project and run. Click the "Test 100 times" button and wait until the sum of three numbers is 100. If the number under the "Information" text block is 100, click the button again. WinRTCallUnstable.zip
When the problem is successfully reproduced, the error count > 0.
Expected behavior
System.InvalidCastException
should not be thrown.Version Info
I don't use CsWinRT nuget directly, but my application call WinRT APIs through
WinRT.Runtime.dll
which is generated by CsWinRT. The target framework isnet5.0-windows10.0.18362.0
..NET SDK (reflecting any global.json): Version: 5.0.200 Commit: 70b3e65d53
Runtime Environment: OS Name: Windows OS Version: 10.0.19042 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\5.0.200\
Host (useful for support): Version: 5.0.3 Commit: c636bbdc8a
Visual Studio 2019 16.9.0
Additional context This problem is probably related to threading or garbage collection. I added
try ... catch
to retry WinRT API calls in my code as workaround. I don't know whether my workaround is safe, because I'm not sure if this bug will cause memory corruption.