microsoft / DirectX-Graphics-Samples

This repo contains the DirectX Graphics samples that demonstrate how to build graphics intensive applications on Windows.
MIT License
5.95k stars 2.01k forks source link

D3D12HelloFrameBuffering: Synchronization Question #878

Closed ThIsJaCk23657689 closed 1 month ago

ThIsJaCk23657689 commented 1 month ago

Hi, I'm trying to understand the CPU / GPU synchronization in DirectX 12, but there are somethings that confuse me. Here is the sample code from the HelloFrameBuffering example:

// Prepare to render the next frame.
void D3D12HelloFrameBuffering::MoveToNextFrame()
{
    // Schedule a Signal command in the queue.
    const UINT64 currentFenceValue = m_fenceValues[m_frameIndex];
    ThrowIfFailed(m_commandQueue->Signal(m_fence.Get(), currentFenceValue));

    // Update the frame index.
    m_frameIndex = m_swapChain->GetCurrentBackBufferIndex();

    // If the next frame is not ready to be rendered yet, wait until it is ready.
    if (m_fence->GetCompletedValue() < m_fenceValues[m_frameIndex])
    {
        ThrowIfFailed(m_fence->SetEventOnCompletion(m_fenceValues[m_frameIndex], m_fenceEvent));
        WaitForSingleObjectEx(m_fenceEvent, INFINITE, FALSE);
    }

    // Set the fence value for the next frame.
    m_fenceValues[m_frameIndex] = currentFenceValue + 1;
}

My question is, why do we update the m_frameIndex before checking if the fence has reached the expected fence value? This means we use the fence value of a different framebuffer, which is not the same value we used in the Signal() call. This seems a bit strange to me.

I also check out the Nvidia's sample code, and here is their version:

struct FrameContext
{
    ComPtr<ID3D12CommandAllocator> m_allocator;
    ComPtr<ID3D12CommandAllocator> m_computeAllocator;
    ComPtr<ID3D12Fence>            m_fence;
    uint64_t                       m_fenceValue = 0;
};

void DeviceResources::MoveToNextFrame()
{
    FrameContext* ctx = &m_frameContext[m_frameIndex];
    DX::ThrowIfFailed(m_commandQueue->Signal(ctx->m_fence.Get(), ctx->m_fenceValue));
    m_frameIndex = m_swapChain->GetCurrentBackBufferIndex();
    if (ctx->m_fence->GetCompletedValue() < ctx->m_fenceValue)
    {
        DX::ThrowIfFailed(ctx->m_fence->SetEventOnCompletion(ctx->m_fenceValue, m_fenceEvent.Get()));
        WaitForSingleObjectEx(m_fenceEvent.Get(), INFINITE, false);
    }
    ctx->m_fenceValue++;
}

As we can see, they use the same fence value for the 'Signal()' call and for comparision with 'GetCompletedValue()'. Could someone help me understand the pros and cons of these two approaches? Thanks in advanced!

weltkante commented 1 month ago

My question is, why do we update the m_frameIndex before checking if the fence has reached the expected fence value?

You want to wait for the frame to complete that you're about to reuse, so you can be sure its safe to discard its resources, not to wait for the frame that you just pushed to the GPU (that would be locksteping single frames)

ThIsJaCk23657689 commented 1 month ago

@weltkante Thanks! You completely solved my question! Because we just need to make sure that the rendering is finished on the next frame buffer that we are going to render. If it is not, then we wait until it is done!

Therefore, Mircosoft's sample code is more efficient than the Nvidia's. Obviously, Nvidia's code will stop until the GPU is finised its jobs on the current frame buffer.

ThIsJaCk23657689 commented 1 month ago

@weltkante I am wondering what happens when the rendering of the next frame buffer finishes fast than the currrent one? For example:

After rendering framebuffer 1, we call 'Signal()' and tell GPU that if it has finished its current job (which is the rendering of framebuffer 1), then set the fence to '6'. Then, we update the frame index to 0, and pick the fence value of framebuffer 0, which is '5'.

Because we want to start rendering framebuffer 0, we need to make sure that the previous work the GPU had been doing is finished. We call GetCompletedValue() to get the current fence value. If the return value is less than the '5', it mens that framebuffer 0 is not done yet.

But what if the framebuffer 1 finishes faster than framebuffer 0? If so, does GetCompletedValue() return '6' instead of '5', causing unsynchronized problems because it mistakenly indicates that the rendering on framebuffer 0 is finished when it actually is not?

ThIsJaCk23657689 commented 1 month ago

There is another question on StackOverflow, and I read the answers. It says calling ID3D12CommnadQueue::ExecuteCommandLists guarantees the fist command list finishes before GPU executes the second one.

If that is true, then there is nothing to worry about, I can trust command queue to execute command lists in the call order, ensuring that the second command lists do not finish faster than the first. But if it is not, then I think there is a necessary reason to create different ID3D12Fence objects for each framebuffer and manage these separately.

Hopefully, I didn't get it wrong. If there is any mistake, please tell me. I would greatly appreciate it!

weltkante commented 1 month ago

[...] if it is not, then I think there is a necessary reason to create different ID3D12Fence objects for each framebuffer and manage these separately.

The Signal you post from the GPU is a synchronization primitive which is only executed once everything before it (on that queue) has finished executing. So, even if a future update of the API/hardware would execute multiple command lists in parallel where possible, it can't do it past such a synchronization primitive, that would break its semantics (and its whole point of existing).

ThIsJaCk23657689 commented 1 month ago

@weltkante I see. You fully resolve my confusion. Now I think I really understand how Fence works and how to modify them to fit my application.

Thanks for helping and answering my question!