Bug: ICanvasImage.GetBounds is not thread safe due to incorrect locking in CanvasEffect::GetD2DImage definition

benstevens48 commented 6 days ago

The CanvasEffect::GetD2DImage function contains some code to detect a cycle in the effect graph. The cycle detection is outside the lock, because, as commented in the code, the lock is not recursive. However, the code also has the comment Cycle checks don't need to be threadsafe because that's just a developer error., which is completely false, because calling the function simultaneously on 2 different threads triggers a cycle detection! Here is the offending code.

ComPtr<ID2D1Image> CanvasEffect::GetD2DImage(ICanvasDevice* device, ID2D1DeviceContext* deviceContext, WIN2D_GET_D2D_IMAGE_FLAGS flags, float targetDpi, float* realizedDpi)
    {
        ThrowIfClosed();

        // Check for graph cycles
        if (m_insideGetImage)
            ThrowHR(D2DERR_CYCLIC_GRAPH);

        m_insideGetImage = true;
        auto clearFlagWarden = MakeScopeWarden([&] { m_insideGetImage = false; });

        // Lock after the cycle detection, because m_mutex is not recursive.
        // Cycle checks don't need to be threadsafe because that's just a developer error.
        auto lock = Lock(m_mutex);
//... code continues

I will create a PR to fix it but @Sergio0694 I would like your advice. I think the best fix is to change the lock to a recursive lock then move the lock before the cycle detection. Are you happy with that? Obviously the lock is used elsewhere but I can't see a problem with using a recursive lock instead of a non-recursive one (I know some purists don't like using recursive locks, but I really don't see why you would want to increase the risk of undefined behaviour by using a non-recursive lock, given the performance difference is probably tiny).

Also, I don't really know how to do PRs in this repo with the different branches. I am actually still using UWP personally so I will probably stick to that branch. Also, I'm a bit hesitant to update my fork to the latest version because of the previous deadlock fixes I made an potential conflicts that I don't have time to deal with right now, so I might base the PR on an old Win2D version. Probably this method hasn't been touched since then so it should probably be OK.

I guess with this change if there is a cycle and we call from 2 threads simultaneously, then there may be deadlock (instead of the current technically undefined but in practice cycle detection error throwing behaviour), but cycles are an error anyway so we probably shouldn't worry too much about that.

Sergio0694 commented 5 days ago

"However, the code also has the comment Cycle checks don't need to be threadsafe because that's just a developer error., which is completely false, because calling the function simultaneously on 2 different threads triggers a cycle detection!"

The part in bold is the point here I think. Trying to draw the same effects on two different threads simultaneously is not a supported scenario. These effects are not guaranteed to be thread safe, and this is an example of that. I'm a bit wary of changing the lock type, as that seems like a riskier change to something that has been working like this for over a decade now, and I wouldn't want to regress things. If we just want to at least improve the developer experience here and better handle race conditions, why not simply make writes to that field atomic? That is:

Use a CompareExchange to set m_insideGetImage and read its old value
Use a volatile write to reset it when exiting the scope

This is a minimal change and would at least avoid two threads trying to take the lock in this scenario.

@getrou thoughts?

FWIW, I'm doing this in the ComputeSharp implementation of ICanvasImageInterop, so it makes sense to me.

benstevens48 commented 5 days ago

@Sergio0694 - I'm pretty sure these effects are meant to be able to be used from multiple threads, else why have locks at all?

benstevens48 commented 5 days ago

I created a pull request which locally fixed the issue I was having. By the way, I found it almost impossible to get Win2D to build. I merged in the latest version from here but then I had to do the following. Upgrade all nuget packages, set min target to 16299 everywhere and also change the target version for some samples, remove custom nuget configs, and disable the cswinrt projection, which I couldn't get to work no matter how hard I tried (I think the TFM changes may be an issue - I am using the latest VS).

benstevens48 commented 5 days ago

PS this Direct2D page says the following about effects

You should share heavy-weight resources (like bitmaps and complex effect graphs) that are initialized once and then never modified across threads to increase performance.

so I assume there is nothing inherently wrong with using effects on more than one thread at once (assuming you don't modify the properties of the effect on one thread while drawing it on another).

Sergio0694 commented 5 days ago

That's D2D, it doesn't necessarily mean it applies to Win2D as well. Two different layers of the stack.

benstevens48 commented 5 days ago

Yes, but Win2D is written to be threadsafe. IMO what happened here was that the code was originally written correctly without the cycle check. Then someone realised a cycle could create a nasty loop so they probably ought to check for it (even though it's a developer error). It's clear from their comment that they would have put the cycle detection code inside the lock, but they realised they couldn't do that because the lock wasn't recursive, so they put it outside the lock and then perhaps stopped thinking about it because the cycle detection wasn't completely necessary in the first place. Without the cycle detection code, it's absolutely fine to use the effect from multiple threads simultaneously and all the correct locking is there. However IMO the person who added the cycle detection code failed to think through that if two threads use the effect at once, the first will set the m_insideGetImage flag to true, then move on to the (expensive) method body, then the second thread will come along, see m_insideGetImage is true and throw a cycle detection error, when in fact everything is perfectly fine. The fix is easy and is to move the cycle detection code inside the lock and make the lock recursive, as per my pull request.

microsoft / Win2D

Bug: ICanvasImage.GetBounds is not thread safe due to incorrect locking in CanvasEffect::GetD2DImage definition #970