microsoft / Win2D

Win2D is an easy-to-use Windows Runtime API for immediate mode 2D graphics rendering with GPU acceleration. It is available to C#, C++ and VB developers writing apps for the Windows Universal Platform (UWP). It utilizes the power of Direct2D, and integrates seamlessly with XAML and CoreWindow.
http://microsoft.github.io/Win2D
Other
1.82k stars 289 forks source link

Deadlocks again #748

Open benstevens48 opened 4 years ago

benstevens48 commented 4 years ago

Further to the issue https://github.com/microsoft/Win2D/issues/663 that I created before, I have found more instances in which deadlocks can occur in Win2D.

The basic problem is that Win2D is susceptible to lock inversion. This is because sometimes a Direct2D lock (by which I mean the internal Direct2D factory lock which is presumably locked by most Direct2D calls or the explicit lock that the factory offers) is taken inside a Win2D lock (by which I mean a lock defined by the Win2D code) and sometimes the other way around.

One example (taken from the previous issue) of where Win2D takes a Win2D lock inside a Direct2D lock is in the CanvasSwapChain.CreateDrawingSession method. There are several examples of taking a Direct2D lock (implicitly) inside a Win2D lock e.g. ResourceManager::GetOrCreate for a CanvasVirtualBitmap due to the GetLocalBounds call.

The new deadlock I discovered (I think this was the cause of my deadlock although it was hard to reproduce) occurs with ICanvasImage.GetBounds. Basically this locks on the device context pool lock before creating a device context (if necessary), which uses an implicit Direct2D lock I assume. If, at the same time, on another thread, I call using(var l = Device.Lock()){image.GetBounds();}, then this has the potential to deadlock (if you are unlucky) due to the inversion of the locks. (The code I'm actually calling is more like using(var l = Device.Lock()){GetOrCreate();} for a CanvasVirtualBitmap, and GetOrCreate effectively calls GetBounds - the reason for taking the explicit lock in the first place is the workaround from the previous issue).

So now I've added another workaround to the list of workarounds I'm using, which is to call Device.Lock() before GetBounds()...

The locking issues really need addressing. Firstly, a it needs to be decided which is the innermost lock - the Direct2D one or Win2D locks. Given that it is possible to externally call Device.Lock(), I guess the Win2D locks have to be the innermost ones. Therefore whenever a Win2D lock is entered, either no Direct2D lock (implicit or explicit) should be taken inside it, or, prior to entering the lock, an explicit Direct2D lock should be taken. This should be applied to the entire library.

I realise this is a quite a tricky area, and I am not an expert so might comments may not be that helpful, but it would be good if this problem was addressed.

benstevens48 commented 4 years ago

@shawnhar - I am happy to create a pull request which attempts to fix a couple of these problems. The issue with the device context pool can be solved with a simple rearrangement of the TakeLease function, and I can think of a couple of different ways of addressing the issue with the GetOrCreate function. Would it be helpful for me to create a couple of pull requests?

benstevens48 commented 4 years ago

I have a few more comments to make. First, I notice that GetOrCreate is actually called internally in a number of places, so it's not necessary to use interop to trigger this bug. For example, trying the read the SourceColorProfile from a ColorManagementEffect, if the SourceColorProfile Win2D wrapper has gone out of scope, at the same time as creating a drawing session on a swap chain will trigger the deadlock. I ran into this recently.

Next, I would like to amend what I said about innermost and outermost locks. In my opinion it would actually be very difficult for Win2D never to take any lock of its own around a Direct2D call or Direct2D lock, although it certainly would be possible and strongly recommended in my opinion not to do this inside the static lock used by the ResourceWrapper/Manager. So I don't think it can ever be safe for a user to call CanvasDevice.Lock followed by a Win2D function unless that Win2D function is explicitly listed as safe for such a scenario. So the Direct2D lock should be thought of as the inner lock, but it's still good to avoid using it inside a Win2D lock unless absolutely necessary.

Finally, I have done another review of my pull requests https://github.com/microsoft/Win2D/pull/751 and https://github.com/microsoft/Win2D/pull/753 which fix the deadlock. In my opinion, https://github.com/microsoft/Win2D/pull/751 is a no-brainer and a very tiny code change. I've recently reviewed https://github.com/microsoft/Win2D/pull/753 again and I believe it is logically sound, and have decided to incorporate it into the version of Win2D I'm using in my app (I have been using https://github.com/microsoft/Win2D/pull/751 plus the workaround suggested in https://github.com/microsoft/Win2D/issues/663 up to this point).

I have also added another commit to https://github.com/microsoft/Win2D/pull/753. This is a very small change that releases the Direct2D lock that is explicitly taken when creating a canvas drawing session on a swap chain before calling the CanvasDrawingSession constructor (there is no need for it here as far as I can see), which prevents the ResourceWrapper/Manager lock from being locked inside the Direct2D lock. I think this might be the only place where this occurs internally, and this was the original cause of my deadlock, so if you don't want to implement any of the other fixes, you could just take this code change (it would require a separate pull request to separate it out), but I still think the other changes are good changes that are worth implementing.