nvpro-samples / nvpro_core

shared source code and resources needed for the samples to run
Apache License 2.0
489 stars 114 forks source link

vkQueuePresentKHR is waiting on semaphore that has no way to be signaled #37

Closed f1shel closed 2 years ago

f1shel commented 2 years ago

Hi! As the title says, I encountered some very strange bugs and the program will exit unproperly in:

void nvvk::AppBaseVk::prepareFrame() {
  // ...
  assert(result == VK_SUCCESS); // <---- ended with VK_ERROR_DEVICE_LOST
}

On my local machine, the swapchain creates 3 entries and so has 3 somaphore pairs (readSemaphore and writeSemaphore). The brief code of my program is shown below:

// Main loop
while (!glfwWindowShouldClose(...)) {
    // ...
    prepareFrame();
    // create command buffer and record commands ...
    vkBeginCommandBuffer(cmdBuf, &beginInfo);
    // -- ray tracing...
    // -- post processing...
    vkEndCommandBuffer(cmdBuf);
    submitFrame();
}

When the loop goes through the 2nd (also 3rd) time, submitFrame()::present()::vkQueuePresentKHR() trigers Vulkan Validation Layer Error:

--> Validation Error: [ VUID-vkQueuePresentKHR-pWaitSemaphores-03268 ]
Object 0: handle = 0x14e37396870, name = queueGCT, type = VK_OBJECT_TYPE_QUEUE;
Object 1: handle = 0x2b424a0000000034, name = swapchainWrittenSemaphore:1, type = VK_OBJECT_TYPE_SEMAPHORE; |
MessageID = 0x251f8f7a |
vkQueuePresentKHR: Queue VkQueue 0x14e37396870[queueGCT] is waiting on
pWaitSemaphores[0] (VkSemaphore 0x2b424a0000000034[swapchainWrittenSemaphore:1]) that has no way to be signaled.

According to the Vulkan spec which states:

All elements of the pWaitSemaphores member of pPresentInfo must reference a semaphore signal operation that has been submitted for execution and any semaphore signal operations on which it depends (if any) must have also been submitted for execution

The error is most likely caused by a semaphore that will never be signaled. However, this is weird. I debugged the code and found that the waitSemaphore in vkQueuePresentKHR() should have been signaled by vkQueueSubmit() in submitFrame().

vkQueueSubmit:

  VkSemaphore semaphoreRead  = m_swapChain.getActiveReadSemaphore();
  VkSemaphore semaphoreWrite = m_swapChain.getActiveWrittenSemaphore();
  // which returns m_entries[(m_currentSemaphore % m_imageCount)].writtenSemaphore;

  // Pipeline stage at which the queue submission will wait (via pWaitSemaphores)
  const VkPipelineStageFlags waitStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
  // The submit info structure specifies a command buffer queue submission batch
  VkSubmitInfo submitInfo{VK_STRUCTURE_TYPE_SUBMIT_INFO};
  submitInfo.pWaitDstStageMask = &waitStageMask;  // Pointer to the list of pipeline stages that the semaphore waits will occur at
  submitInfo.pWaitSemaphores = &semaphoreRead;  // Semaphore(s) to wait upon before the submitted command buffer starts executing
  submitInfo.waitSemaphoreCount   = 1;                // One wait semaphore
  submitInfo.pSignalSemaphores    = &semaphoreWrite;  // Semaphore(s) to be signaled when command buffers have completed
  submitInfo.signalSemaphoreCount = 1;                // One signal semaphore
  submitInfo.pCommandBuffers = &m_commandBuffers[imageIndex];  // Command buffers(s) to execute in this batch (submission)
  submitInfo.commandBufferCount = 1;                           // One command buffer
  submitInfo.pNext              = &deviceGroupSubmitInfo;

  // Submit to the graphics queue passing a wait fence
  vkQueueSubmit(m_queue, 1, &submitInfo, m_waitFences[imageIndex]);

set presentInfo in vkQueuePresentKHR:

  VkSemaphore& written = m_entries[(m_currentSemaphore % m_imageCount)].writtenSemaphore;

  presentInfo                    = {VK_STRUCTURE_TYPE_PRESENT_INFO_KHR};
  presentInfo.swapchainCount     = 1;
  presentInfo.waitSemaphoreCount = 1;
  presentInfo.pWaitSemaphores    = &written;
  presentInfo.pSwapchains        = &m_swapchain;
  presentInfo.pImageIndices      = &m_currentImage;

  m_currentSemaphore++;

presentInfo.pWaitSemaphores and submitInfo.pSignalSemaphores actually are same: QQ截图20220409201742

The strangest thing is that this bug seems to be related to my shader code. When I simply generate uv in ray generation, the program runs fine. When I try to generate a distance image in ray tracing, say:

// raytrace.intersection_test.rchit
// ...
void main() {
    payload.radiance = vec3(gl_HitTEXT/100.0);
}

or

// raytrace.intersection_test.rchit
// ...
void main() {
    debugPrintfEXT("My float is %f", gl_HitTEXT);
    payload.radiance = vec3(gl_HitTEXT);
}

this error will occur. And if I make a small adjustment:

// raytrace.intersection_test.rchit
// ...
void main() {
    payload.radiance = vec3(gl_HitTEXT);
}

the bug disappears.

Could you please help me? Thanks in advance.

f1shel commented 2 years ago

Some code here: https://gist.github.com/f1shel/3707292d6b6683c5eec3a4a12d6f4781

f1shel commented 2 years ago

Solved. Finally, I found that only ray generation shader was added to the shader group when creating shader binding table.

NBickford-NV commented 2 years ago

I just ran into this myself, and it looks like the fix here was correct - it looks like if the device segfaults (e.g. when trying to access a shader group not in the shader binding table), the current Vulkan Validation Layer first emits this semaphore error before we catch VK_ERROR_DEVICE_LOST.