microsoft / angle

ANGLE: OpenGL ES to DirectX translation
Other
615 stars 166 forks source link

Long delay in eglSwapBuffers (D3D11 Present) after surface resize on Surface Studio #111

Closed svoisen closed 5 years ago

svoisen commented 7 years ago

Hello! We've noticed an issue with our application, as well as in the OpenGLES 2 C++ UWP template project (the spinning cube that comes with ANGLE), where calls to eglSwapBuffers after a window resize result in very long frame delays (~2 seconds). So far, we've only noticed this behavior on the Surface Studio hardware (NVidia GeForce GTX 965M). We don't see this on the Surface Book or on the few Thinkpad P50s we've tried.

Digging a little deeper, we've found that the delay is specifically related to the call to Present1 on the SwapChain, which calls Present in D3D11.

I've included a screenshot of the Concurrency Visualizer in VS2105 showing the issue when debugging the unmodified OpenGLES template project that comes with ANGLE. Note the 2 second delay:

swapbuffers_delay

As a result, I'm unsure if this is an issue with ANGLE, D3D11, or the Nvidia driver on the Surface Studio (running driver version 21.21.13.7620). But we don't see this issue when during calls to Present1 when using DX11 directly (such as the template DX11 XAML project in VS2015). I've even tried modifying the parameters for mSwapChain->Present1 in SwapChain11.cpp (such as disabling vsync wait by passing 0 as first param) without much luck.

austinkinross commented 7 years ago

Thanks for reporting this! Our team has borrowed a Surface Studio, and we'll be taking a look the issue soon.

Thanks again, Austin

CoreyDotCom commented 7 years ago

@austinkinross , thanks for helping, this one is definitely a show stopper from some of our desktop apps here at Adobe.

I have a Surface Book notebook that was working just fine with NVidia driver 21.21.13.6961. After updating it to 21.21.13.7620, and also trying 21.21.13.7849, the problem above is introduced.

Do you guys have contacts at NVidia or do you know of the most expedient way to get an issue filed and fixed on their end (if in fact this isn't something you guys can workaround)?

Rolling back to 21.21.13.6961 things work as they should again.

svoisen commented 7 years ago

@austinkinross Confirmed the same as @CoreyDotCom on the Surface Studio (which has GTX 965M). Rolling back the driver to 21.21.13.6961 or earlier resolves the issue.

austinkinross commented 7 years ago

Thank you both! That's useful info. Would either of you be able to confirm the WDDM version supported by 369.61 and 376.20+? If you run dxdiag and select "Display 1", then you should see this on the right hand side:

Driver Model: WDDM x.x

We definitely have contacts at NVIDIA if necessary. We're not currently sure if this is a driver or runtime issue though, so we'd like to investigate it further before sending it to NVIDIA.

Thanks! Austin

svoisen commented 7 years ago

@austinkinross The newer driver (376.20) is WDDM 2.1. Older driver (369.61) is WDDM 2.0.

austinkinross commented 7 years ago

Thanks for confirming that.

This issue is caused by a bug in the D3D11 runtime while running on WDDM 2.1 drivers. In short, ANGLE is making a certain pattern of calls to DXGISwapChain::Present() followed by a call to DXGISwapChain::ResizeBuffers() that can tickle the runtime in the wrong way, and cause it to wait for 2 seconds before recovering.

The fix to the D3D11 runtime will be available later this year.

In the meantime, a workaround is to create the DXGI swapchain in ANGLE (in CoreWindowNativeWindow.cpp) with the 'DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT' flag, and then to wait on the handle returned by IDXGISwapChain2::GetFrameLatencyWaitableObject() before ANGLE's call to IDXGISwapChain::ResizeBuffers in SwapChain11.cpp. The ResizeBuffers() call will also have to be updated to use the DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag.

We'll be pushing this workaround to our GitHub repository in the next few days, but feel free to try it out yourselves if you'd like to test it sooner.

Austin

CoreyDotCom commented 7 years ago

Thanks Austin! Will this workaround be benign once paired with the soon to be fixed D3D11 runtime? e.g. once it's fixed for real, will any app using the workaround be adversely affected? Won't this have the side effect of reducing our ability to render ahead, and reduce the effective framerate? (Per docs "Set this flag to create a waitable object you can use to ensure rendering does not begin while a frame is still being presented.")

abucur commented 7 years ago

Correct me if I'm wrong but wouldn't frames longer than 16ms on the CPU basically force a 30FPS render loop when used with GetFrameLatencyWaitableObject()? Skipping a single frame will lead to wasted CPU cycles as the render thread will have to wait for the next frame to end, right?

abucur commented 7 years ago

Nevermind, we need to wait on that object only when resizing, right? In that case the workaround sounds good to me.

austinkinross commented 7 years ago

@CoreyDotCom - the D3D11 runtime fix won't affect the app. The app will continue to work the same way as it will today with the workaround.

@abucur - correct, the wait is only done while resizing the window.

abucur commented 7 years ago

@austinkinross one more thing I just realized. Our app is using SwapChainPanel composition not CoreWindow directly. You mentioned only CoreWindow in the comment above, but the workaround will be available for SwapChainPanel as well, right? Thanks!

svoisen commented 7 years ago

@austinkinross Thanks for posting the workaround! I gave it a shot on the Surface Studio with the GLES2 example. I'm definitely seeing improvement, though still about a 100ms delay on average during resize (better than 2 seconds though). That said, I'm wondering if I coded up the patch correctly:

  1. I've set the flag DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT when creating the swap chain in CoreWindowNativeWindow.cpp.
  2. I've set the flag DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT as the last arg in the ResizeBuffers call in SwapChain11.cpp.
  3. I store a reference to the waitable object as part of the reset method in SwapChain11.cpp.
    IDXGISwapChain2 *swapChain2 = d3d11::DynamicCastComObject<IDXGISwapChain2>(mSwapChain);
    mSwapChainWaitableObject = swapChain2->GetFrameLatencyWaitableObject();
  4. In resize in SwapChain11.cpp I call this just before the call to mSwapChain->ResizeBuffers:
    WaitForSingleObjectEx(mSwapChainWaitableObject, 1000, true);

I suppose that's all there is to it? Either way, thanks for your team's quick response on this one. We will wait for the official update to be pushed to GitHub anyway.

Per @abucur question above, I suppose if we set the same DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag as part of createSwapChain in SwapChainPanelNativeWindow.cpp, then we should be good to go?

austinkinross commented 7 years ago

@svoisen: Your implementation sounds correct to me. I'm afraid I don't know why you're seeing a 100ms delay though-- I'll try to take a look before we push the patch to GitHub.

@abucur: Yes the same workaround applies to SwapChainPanel too, and we'll push that to GitHub as well.

Thanks!

svoisen commented 7 years ago

@austinkinross FWIW, I noticed less delay (subjective, did not measure) with the WDDM 2.1 driver than the 2.0 driver. Also, running the sample in debug mode (did not try release).

austinkinross commented 7 years ago

I've just pushed a fix to our GitHub respository. Thanks again for reporting this issue!

Would you like me to publish a new NuGet package with this fix as well? It would be great if you have time to validate that you're unblocked by our final fix before we release a new NuGet package, but it's not a problem if you can't.

@svoisen - Thanks. I wasn't able to repro the delay on my Surface Studio in a release build, so hopefully it won't be a problem for you.

svoisen commented 7 years ago

Thanks @austinkinross! I'm planning to to pull the fix and try it out this afternoon. Will let you know.

svoisen commented 7 years ago

Hi @austinkinross - Confirmed that the fix works with our application using WDDM 2.1 drivers. If you can post the new NuGet package, that would be great. Thanks for your help!

austinkinross commented 7 years ago

Great, thanks a lot! We've published the package here: https://www.nuget.org/packages/ANGLE.WindowsStore/2.1.14

Thanks again for reporting this issue and for your help testing the fix!

svoisen commented 7 years ago

Hi @austinkinross - back again :) Since pulling the fix, we've noticed some significant rendering delays when using it with WDDM 2.0 drivers. On 2.1 it seems fine, however. We're wondering if it's possible to only turn on the waitable object flag for 2.1 and leave as-is for 2.0 or earlier.

svoisen commented 7 years ago

@austinkinross Perhaps I spoke too soon. What I wrote above is true on Nvidia (at least on Surface Studio), but for Intel HD 520 on Surface Book, there is lag with both 2.0 and 2.1 drivers. So, now not so sure what is going on. If you have any advice on where to look, I'm definitely open. We do, for instance, use SetMaximumFrameLatency and set it to 1. Previously we were doing it on the device, but I understand with this change it must be set on the swap chain directly. One of us has tried that, however, with no real effect.

CoreyDotCom commented 7 years ago

@austinkinross, just wanted to add more color on the chance you get to look at it sooner than later. What we're seeing is essentially a lag on each frame. So for example our drawing surface is using ANGLE, as the user draws a shape, there is a delay at the end of each frame, where the content stutters and pauses, then eventually catches up to the mouse.

We see this behavior when we simply set the DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT flag on the SwapChain. It seems to change the internal behavior of the SwapChain that forces the odd latency.

Our original hope as @svoisen points out , was that this lag only happens with WDDM 2.0 drivers, but this lead didn't pan out. (Intel WDDM 2.1 drivers have same lag with the change).

austinkinross commented 7 years ago

Thanks all, are you able to share a repro app for these issues?

CoreyDotCom commented 7 years ago

@austinkinross Unfortunately I have yet to repro it with a standalone app.

However...with the OpenGLES 2 C++ UWP template project (the spinning cube that comes with ANGLE), regardless of driver model and driver, with ANGLE 2.1.14, resize of the sample window is super chunky (due to the wait). This is sort of what we see in our app - however, of course we aren't hitting the wait code, nor resizing our window. So at a minimum I suppose the patch as is we probably don't want it to regress resize perf of the sample.

Will keep attempting to repro our issue in a standalone test.

CoreyDotCom commented 7 years ago

@austinkinross, I can send you a video that shows what's happening if it's useful to you.

But basically we use dxgiDevice1->SetMaximumFrameLatency(1) in our app to reduce input latency. Since 2.1.14 introduces the new DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT on the Swap Chain, this is no longer working for us. If you look at the docs for that flag, it says:

When this flag is used, the swapchain's latency must be set with the 
IDXGISwapChain2::SetMaximumFrameLatency API instead of IDXGIDevice1::SetMaximumFrameLatency.

But when we attempt to do this in ANGLE as a test, no dice.

Basically in SwapChainPanelNativeWindow.cpp, below the call to CreateSwapChainForComposition on line 275 or so, we are doing:

if (SUCCEEDED(result))
{
    result = mSwapChainPanel.AS(&swapChainPanelNative);
    ComPtr<IDXGISwapChain2> swapChain2;
    HRESULT result = newSwapChain.As(&swapChain2);
    if (SUCCEEDED(result)) 
    {
        result = swapChain2->SetMaximumFrameLatency(1);
    }
}

But it seems to have no affect whatsoever.

austinkinross commented 7 years ago

Hi @CoreyDotCom, the default value for IDXGISwapChain2::SetMaximumFrameLatency is 1-- the MSDN documentation for this is incorrect. As a result, calling IDXGISwapChain2::SetMaximumFrameLatency with maxLatency = 1 in this case is expected to have no effect. Note that you can retrieve the frame latency via IDXGISwapChain2::GetMaximumFrameLatency if required.

A video may help, but a standalone debuggable app would be a lot more helpful if possible.

CoreyDotCom commented 7 years ago

Thanks @austinkinross , at this point the SetMaximumFrameLatency is a red herring. The lag we are seeing on WDDM 2.0 machines has nothing to do with it. Short of a debuggable app - here are a couple traces with the Concurrency Visualizer (I can send you the actual traces if helpful).

But essentially, here is a clean run with a WDDM 2.0 machine (Surface Book integrated Intel GPU) , with Angle 2.1.13 as we are interacting with our canvas and our tiled renderer is updating textures each frame and painting updates. Note how tight the green bars are during the activity, no skips or lags.

image

Now note the same WDDM 2.0 machine (Surface Book integrated Intel GPU) with Angle 2.1.14:

image

You will see how many lags there are. Each lag that occurs is due to d3d11.dll!NDXGI::CDevice::WaitForSynchronizationObjectFromCpuCB - in most frames it's stuck there as we are uploading textures or as we are calling glClear. And it's stuck there for some very lengthy amount of time, resulting in the stalls.

This delay is not apparent with a WDDM 2.1 driver on the same machine.

I've yet to repro our specific issue in the Angle reference app (spinning cube app) - but that sample does very little with texture uploads each frame and nothing with interactivity. Though a regression can be seen when resizing the app.

At this point with the very chunky resize perf of Angle 2.1.14 in general (irregardless of the very bad issue we're seeing above) - it probably warrants a revert. You can see the resize issue with the stock Angle reference apps on a WDDM 2.0 Intel machine such as the Surface Book. Or..if there is a way to localize the changes to WDDM 2.1 drivers only. With WDDM 2.1 things are much better for us, but there is a slight difference in latency (not as dramatic as above) - on WDDM 2.1 as well with the waitable swap chains.

CoreyDotCom commented 7 years ago

@austinkinross Apologies for the naive inquiry but...when reading up on waitable swap chains, all the samples and documentation recommend doing something like the following: A) Waiting on the swap chain availability, B) updating app state and drawing, C) Flush/Swap.... in that loop. One of the tech notes even says "With waitable swap chains, block the thread until the swap chain is finished presenting. Note that it is important to call this before the first Present in order to minimize the latency of the swap chain."

But it seems that with the changes in Angle 2.1.14, now that the swap chain is a waitable swap chain, very simple operation such as texture upload, clearing the surface, etc. all end up stuck in this wait state periodically.

Hypothetically I wonder if we exposed the waitable handle to client code, and waited before doing our rendering/updates, then did the swap - if things would work better. Though now of course we're requiring Angle client code to perform much more than it signed up for I guess.

CoreyDotCom commented 7 years ago

@austinkinross so to test my theory I placed a (void)WaitForSingleObject(mFrameLatencyWaitableObject, 1000); at the very end of SwapChain11::present, after the actual Present1 calls, right before the return EGL_SUCCESS....and it solves all of our problems with rendering our canvas and our texture uploads, etc. don't hit the wall.

What we do see however is a bit of contention now during resize..the wait in the resize/buffer resize handler now seems to compound a bit with the wait I added after the Present - and there is a big (500-1000ms) lag on resize.

But hopefully this provides good clues.

CoreyDotCom commented 7 years ago

@austinkinross ... ok final update, good news.

If you simply move the WaitForSingleObject out of the resize() handler where you put it, and put it after the Present1 (as I described above, before the return)... both WDDM 2.0 and WDDM 2.1 apps and all permutations and combinations of Intel and/or Nvidia drivers...seem to work nicely now with no lag or resize slowdown/stall.

svoisen commented 7 years ago

Hi @austinkinross: I can confirm @CoreyDotCom's findings using WDDM 2.0 and 2.1 drivers on Surface Studio as well, for both our application as well as the sample application. Definitely seems like we should move the WaitForSingleObject call for the patch, and then all will be well.

CoreyDotCom commented 7 years ago

@abucur above says "Correct me if I'm wrong but wouldn't frames longer than 16ms on the CPU basically force a 30FPS render loop when used with GetFrameLatencyWaitableObject()? Skipping a single frame will lead to wasted CPU cycles as the render thread will have to wait for the next frame to end, right?"

So with the workaround of course, nobody can do anything in parallel with the Present1 if we go with the consistent wait afterwards. So I guess losing this parallelization would be bad for most users? The catch 22 is that with the waitable swap chain and WDDM 2.0, you can't really do any parallel work anyways while the Present is under way without D3D forcing a wait anyways. Catch 22.

austinkinross commented 7 years ago

Hey everyone, thanks for all the detailed info and investigations. If you feel confident that moving the WaitForSingleObject works for your app then feel free to make the change locally, but I would really like to spend more time investigating this before submitting another patch to ms-master (unfortunately I am busy with other work right now).

A GPUView capture of the problems would be really helpful for us if you're able to share that.

Thanks again, Austin

CoreyDotCom commented 7 years ago

@austinkinross,

It's all greek to me but - here's a GPUView trace of the app exhibiting the problem (stalls, chunkiness, etc.) with 2.1.14 as is in master: https://dl.dropboxusercontent.com/u/40179524/Merged_Angle2_1_14.zip

Here is a trace with 2.1.13 where things work smoothly and there is no 'wait' anywhere during texture upload...: https://dl.dropboxusercontent.com/u/40179524/Merged_Angle2_1_13.zip

I can definitely see a difference in the characteristics of the GPUView zoomed in view especially is it relates to the "flip queue" and especially the "Device Context" row for our process, but don't know enough about the tool to diagnose much further.

2.1.14 ANGLE with waitable swap chain: image

2.1.13 ANGLE: image

austinkinross commented 7 years ago

Hi, thanks a lot for the captures!

I've been speaking to the D3D team here at MS, and for your scenario we believe that the best approach will be to wait on the frame latency handle before each Present() and before ResizeBuffers()-- that should give you close to the behavior that you had with ANGLE 2.1.13 with your call to dxgiDevice1->SetMaximumFrameLatency(1), but it should also work around the potential 2-second wait during window resizing.

We will need a more general purpose solution for all apps, but we will work on that separately.

Could you confirm if the proposed solution (calling Wait before each Present and before ResizeBuffers()) fixes the issue for you on all hardware?

Thanks, Austin

CoreyDotCom commented 7 years ago

It does not @austinkinross for the reason above. If we wait before the Present1, Present1 will return, and we will go back to rendering and prepping the next frame, but then if we uploading textures or glClear, etc. while Present1 is still doing work (e.g. before the wait handle would have said the frame was fully good to go) - we stall... it's only if I add the wait after Present/Present1 as described above, that it works.

Edit: Actually with the wait right before Present there are no stalls in d3d11.dll!NDXGI::CDevice::WaitForSynchronizationObjectFromCpuCB, but perceived latency is higher than if/when we wait after the Present.

austinkinross commented 7 years ago

Okay, that should be fine if it works for you. As you noticed earlier, if you wait after Present1() then you'll no longer need the wait before ResizeBuffers() to workaround the 2-second window resizing issue.

I need to set aside more time to investigate a full fix for this in ms-master. In the meantime, am I correct in thinking that you are unblocked by waiting immediately after Present1()?

CoreyDotCom commented 7 years ago

Yes unblocked, thank you.

teshca commented 7 years ago

Just FYI in our project we had to revert this commit https://github.com/Microsoft/angle/commit/4eb75be9d5a7218baead6abf5f51bf063188fba2 as we got significant lags in ui input.

albertofustinoni commented 7 years ago

Running head first into this issue myself with any drivers using WDDM 2.1 or later.

Is there any fix?

austinkinross commented 7 years ago

Hi there, unfortunately we don't have a fix in ms-master yet. I'll try to find time to investigate this in the near future-- hopefully we can add a workaround for you in ms-master.

Note that the root cause of the issue will be fixed in a future version of Windows.

Thanks, Austin

albertofustinoni commented 7 years ago

Thank you for your reply.

I was worried development had stalled on Angle but glad to see that doesn't seem to be the case

CoreyDotCom commented 7 years ago

@albertofustinoni , the attached patch has served us well with all WDDM versions and our specific use cases. You can give it a try. (I've included only the modified files).

angle-patch.zip

albertofustinoni commented 7 years ago

Hello and thank you for following up.

I have managed to find a workaround for the issue another way, I think: I am using the Angle context to render to a texture, which I then draw to screen using Win2D. I can resize the window without ill effect that way.

What I do is:

Clearing the screen to red using OpenGL ES commands results in a red quad being rendered in my Win2D canvas.

Is this a good approach? As far as I understand it's OK for applications to create multiple Direct3D devices and DXGI sharing means the two textures point to the same video memory.

flibitijibibo commented 6 years ago

Bumping this old thing to ask if https://github.com/Microsoft/angle/commit/4eb75be9d5a7218baead6abf5f51bf063188fba2 was a confirmed fix or just experimental - it seems whenever the application misses 60Hz the window falls behind by up to a second or more. Mind you, it somehow renders at 60Hz, it just plays back a second later than it should o_o

By reverting the above commit we're able to make Xbox One support work again. Is there anyone who absolutely depends on this commit for their app to work?

austinkinross commented 5 years ago

Hi everyone! We recommend that you use the master ANGLE repository going forward: https://groups.google.com/forum/#!forum/angleproject

If you are still having these issues with top-of-tree in the master repository there then I suggest that you email the ANGLE mailing list here: https://groups.google.com/forum/#!forum/angleproject

Microsoft employees and other active ANGLE contributors monitor this mailing list and should be able to help.

Thanks Austin