Open kekekeks opened 4 years ago
Having a look at this implementation of a non-blocking dictionary: https://github.com/VSadov/NonBlocking
More info:
Using a basic lock
or Monitor
also causes this issue. See this repro:
https://gist.github.com/retran/b57e4db1a173048c2cee49ac6d523fc2
I just hit upon this issue and unfortunately this makes SkiaSharp unusable for my app.
Perhaps I'm missing something here, but the only way currently to prevent this issue on Windows is to not use the STA threading model. Unfortunately this is not possible for many desktop apps because:
While some of these issues can be worked around, my app loads third party plugins that are usually written in C++ and often assuming the host is running STA. They also often use the above described functionality.
It should be noted that any Windows app using SkiaSharp and STA is at risk of deadlocking - if the GC happens to run while the lock is held, then it can easily go re-entrant and deadlock.
All that's to say I really need a reliable fix for this. I can't help but think the easiest immediate fix might be to switch the Windows build to use PInvoke and a simple Win32 critical section for the lock?
I'm going to make this change because I really need it but would rather not maintain a separate branch. Is this something you'd consider including in the official build until a proper solution can be developed?
@toptensoftware if you control the synchronization context in your desktop app, one other workaround without patching SkiaSharp might be to install a custom STA-friendly synchronization context. It should request wait notifications (via base.SetWaitNotificationRequired()
) and override SynchronizationContext.Wait
to perform real non-pumping waits (eg, via Win32 WaitForMultipleObjects
, which doesn't pump). Here's why this may work.
It should be possible to do this just for the scope of a particular SkiaSharp call, with try
/finally
to temporarily install a custom non-pumping SynchronizationContext
, call SkiaSharp API, then restore the original synchronization context.
@kekekeks will this also fix the issue for you?
We are currently switching to a non-pumping SynchronizationContext before doing any calls to SkiaSharp on the UI thread and when handing WM_SIZE and WM_PAINT. So the app is switching between SkiaSharp-safe and STA-safe modes. It's not an ideal solution, but works for us for now.
I wasn't aware the locks uses the sync context - I'll look into switching that, but it really seems like a horrible hack. All that overhead for the few instructions it takes to put something into, or get something out of a dictionary.
Isn't the whole point of these locks being alertable in the STA is so you don't get a deadlock situation for COM method calls. Unless something inside one of those HandleDictionary methods is doing COM stuff that needs to callback into the STA that's not an issue. The non-alertable critical section seems a much safer and probably faster solution to me.
I wasn't aware the locks uses the sync context - I'll look into switching that, but it really seems like a horrible hack. All that overhead for the few instructions it takes to put something into, or get something out of a dictionary.
@toptensoftware, as bad as this hack feels, I think it's the only way to safeguard yourself from reentrancy issues, if you're using an STA thread with WindowsFormsSynchronizationContext
(WinForms) or DispatcherSynchronizationContext
(WPF). This is how pumping in blocking WaitOne
calls was designed to work in .NET Framework (to enable COM calls and callbacks). Back in the day, I ranted about it here and I don't think much has changed in .NET Core since then.
The fix you propose might be fixing it for one particular place, but are you sure nothing else is calling WaitOne
across the board, explicitly or implicitly?
You're right of course, but I think for SkiaSharp this is particularly troublesome because:
To say every STA desktop Windows app that uses SkiaSharp needs a custom sync context with switching whenever using Skia is a big ask.
A couple more thoughts on this. My app uses a custom UI toolkit so isn't using WinForms or WPF's sync context but does have its own which until just now used the default implementation of Wait (the problematic one). As an experiment I just changed it a non-alertable wait and the re-entrancy is indeed solved - but of course the STA is now broken.
The question now becomes how to reliably switch between the two modes. The low hanging fruit is to switch around WM_PAINT, WM_SIZE and perhaps a few other key places but I'm far less certain about the possibly hundreds of other places where I might be invoking SkaiSharp. There's the Skia objects that controls create during their construction, theme loading where fonts and images are loaded, temporary objects used during measurement and layout, offscreen rendering that isn't done during paint. The list goes on and on...
The scope of this issue is quite daunting once you understand its implications and I'd much rather be able to rely on SkiaSharp not going re-entrant and then not have to worry about it.
To put it another way, I'm far more worried about missing a SkiaSharp invocation than I am about: "nothing else is calling WaitOne across the board, explicitly or implicitly?"
I agree, ideally SkiaSharp would benefit from using some lower level synchronization primitives which aren't exposed to pumping. I think that Monitor.Enter
/Monitor.Exit
(which is used by C# lock { ... }
) should be fine, although I haven't verified it.
That said, many other things apart from Skia might still be using the .NET blocking calls, which still do that limited amount of pumping by default. You're probably using Skia on tight rendering loops, so you've become affected by this issue in an easily reproducible way. For many other scenarios, that could be a hard to track bug, which might still occasionally pop up and sporadically affect users.
Luckily, because you have your own synchronization context, you have full control over this. You still pump messages in your core event loop (the top-level GetMessage/DispatchMessage
location), so you should be fine with hard OLE drag&drop and other STA requirements. Then, disabling pumping for short-lived blocking WaitOne
calls inside UI event handler (or something like a timer callback) shouldn't be a problem. That worked well for me in the past.
However, even if we disable undesired pumping for managed code (preventing it for WaitOne
via a custom synchronization context), there still could be some 3rd party native code calling something like CoWaitForMultipleHandles
, or just starting its own a nested message loop. That was pretty common back in COM days. I once had to resort to WH_GETMESSAGE
hook just to ignore WM_PAINT
, WM_SIZE
etc if they arrive on a nested message loop (i.e., via any place that calls GetMessage
/PeekMessage
, other than my core event loop).
Hi Andrew, thanks for your insight on this - that's really valuable info. Message hooks to filter paint messages sounds like too much fun.
This is all a bit of a mess, but for understood reasons. I think we agree SkiaSharp could improve things with a non-pumping lock, or at least a way to plugin your own sync mechanism.
Unless, or until this can be fixed SkiaSharp I'm tempted to just disable the pumping wait entirely. I feel like I'm more exposed by that than anything else the app or any plugins might do. But I'm sure that'll bite me somewhere else :)
No worries Brad, glad if it helped... one of those evergreen subtleties of Windows Desktop development with .NET :)
Unless, or until this can be fixed SkiaSharp I'm tempted to just disable the pumping wait entirely.
Doing that on STA thread breaks .NET itself in various ways. We've initially tried to always use a non-pumping context and discovered that it breaks APIs like GC.WaitForPendingFinalizers, those just deadlock.
Hi @kekekeks, thanks for the info, good to know.
Doing that on STA thread breaks .NET itself in various ways. We've initially tried to always use a non-pumping context and discovered that it breaks APIs like GC.WaitForPendingFinalizers, those just deadlock.
@kekekeks I'm curious if you create (or obtain via interop) any COM objects on that STA thread?
I remember having this issue with WaitForPendingFinalizers
and RCW wrappers for MSHTML COM objects. I think, the solution back then was to enable pumping just for the scope of WaitForPendingFinalizers
call, while disabling WM_PAINT
and input messages processing, with WM_SETREDRAW
and EnableWindow(false)
. As horrible as it sounds, that was effective to avoid reentrancy, but later we just got rid of WaitForPendingFinalizers
.
I think that Monitor.Enter/Monitor.Exit (which is used by C# lock { ... }) should be fine, although I haven't verified it.
@noseratio I think using lock
also had this issue: https://gist.github.com/retran/b57e4db1a173048c2cee49ac6d523fc2
@noseratio I think using
lock
also had this issue: https://gist.github.com/retran/b57e4db1a173048c2cee49ac6d523fc2
@mattleibow I might be wrong but I think in this particular case it's caused by Thread.CurrentThread.Join(100)
, where they call it on the main UI thread. This call does indeed pump:
try
{
Debug.Assert(Thread.CurrentThread.ManagedThreadId == mainThreadId);
throwOnPaint = true;
Thread.CurrentThread.Join(100);
Lock();
}
I think, Monitor.Enter
/Monitor.Exit
should be mapping to Win32 EnterCriticalSection
/LeaveCriticalSection
directly, maybe with some added "spin-before-sleep" improvement, but I can't find anything confirming that in the dotnet
repo right away. I might try modifying the code you linked to verify this theory :)
@mattleibow you were right! TIL, plain c# locks (ie., Monitor.Enter
/Monitor.Exit
) also pump on an STA thread 👀
https://gist.github.com/noseratio/0d93133b826339c7eaf49c9feec142e3#file-winformsapp-cs-L60
Now I wonder if there's any managed blocking synchronization primitive that doesn't pump in this case, besides Thread.Sleep
...
Edited: and here's where the pumping magic is happening: https://github.com/dotnet/runtime/blob/0c0bd5a4066bc9b1a165a45e3553dfa14b87729b/src/coreclr/vm/threads.cpp#L3342
This is caused by all locks in .NET being alertable locks that can still run the message pump when waiting for a lock.
This behavior might lead to unexpected behavior in the user code or even deadlocks, since user code doesn't expect to receive WM_PAINT while calling SKPath constructor.