GPU: SDL_MapGPUTransferBuffer() return garbage data after memory defrag

meyraud705 commented 1 week ago

With the Vulkan backend, if VULKAN_INTERNAL_DefragmentMemory() is called, the data returned by SDL_MapGPUTransferBuffer() may contain garbage even if you wait on the fence of the previous command buffer.

You can reproduce with the modified CopyAndReadback example provided below:

put the the read back in a loop
trigger a defrag by creating and releasing buffers at random

You also need to modify the Vulkan implementation to do a defrag without presenting by commenting this line: https://github.com/libsdl-org/SDL/blob/main/src/gpu/vulkan/SDL_gpu_vulkan.c#L10291

CopyAndReadback.c

```c #include // gcc -g -Og -fPIE -fno-strict-aliasing -Wall -Wextra -lSDL3 ./CopyAndReadback.c -o test_readback int main(void) { if (!SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS)) { SDL_Log("SDL initialisation failed: %s", SDL_GetError()); return -1; } SDL_GPUDevice* device = SDL_CreateGPUDevice(SDL_GPU_SHADERFORMAT_SPIRV, true, NULL); if (!device) { SDL_Log("GPUCreateDevice failed"); return -1; } Uint32 bufferData[8] = { 2, 4, 8, 16, 32, 64, 128 }; SDL_GPUBuffer* OriginalBuffer = SDL_CreateGPUBuffer( device, &(SDL_GPUBufferCreateInfo) { .usage = SDL_GPU_BUFFERUSAGE_COMPUTE_STORAGE_READ | SDL_GPU_BUFFERUSAGE_COMPUTE_STORAGE_WRITE, .size = sizeof(bufferData) } ); SDL_GPUTransferBuffer* downloadTransferBuffer = SDL_CreateGPUTransferBuffer( device, &(SDL_GPUTransferBufferCreateInfo) { .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD, .size = sizeof(bufferData) } ); // Create and release buffer randomly to trigger a defrag SDL_GPUTransferBuffer* tbuffers[32] = {NULL}; for (int i = 0; i < 32; ++i) { tbuffers[i] = SDL_CreateGPUTransferBuffer( device, &(SDL_GPUTransferBufferCreateInfo) { .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD, .size = 128 } ); } SDL_GPUTransferBuffer* uploadTransferBuffer = SDL_CreateGPUTransferBuffer( device, &(SDL_GPUTransferBufferCreateInfo) { .usage = SDL_GPU_TRANSFERBUFFERUSAGE_UPLOAD, .size = sizeof(bufferData) } ); Uint8* uploadTransferPtr = SDL_MapGPUTransferBuffer( device, uploadTransferBuffer, false ); SDL_memcpy(uploadTransferPtr, bufferData, sizeof(bufferData)); SDL_UnmapGPUTransferBuffer(device, uploadTransferBuffer); SDL_GPUCommandBuffer* cmdbuf = SDL_AcquireGPUCommandBuffer(device); SDL_GPUCopyPass* copyPass = SDL_BeginGPUCopyPass(cmdbuf); // Upload original buffer SDL_UploadToGPUBuffer( copyPass, &(SDL_GPUTransferBufferLocation) { .transfer_buffer = uploadTransferBuffer, .offset = 0, }, &(SDL_GPUBufferRegion) { .buffer = OriginalBuffer, .offset = 0, .size = sizeof(bufferData) }, false ); SDL_EndGPUCopyPass(copyPass); SDL_GPUFence* fence = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf); SDL_WaitForGPUFences(device, true, &fence, 1); SDL_ReleaseGPUFence(device, fence); bool quit = false; Uint32 i_frame = 0; while (!quit) { SDL_Event evt; while (SDL_PollEvent(&evt)) { if (evt.type == SDL_EVENT_QUIT) { quit = true; } else if (evt.type == SDL_EVENT_KEY_DOWN && evt.key.scancode == SDL_SCANCODE_ESCAPE) { quit = true; } } ++i_frame; cmdbuf = SDL_AcquireGPUCommandBuffer(device); // Download the original bytes from the copy copyPass = SDL_BeginGPUCopyPass(cmdbuf); SDL_DownloadFromGPUBuffer( copyPass, &(SDL_GPUBufferRegion) { .buffer = OriginalBuffer, .offset = 0, .size = sizeof(bufferData) }, &(SDL_GPUTransferBufferLocation) { .transfer_buffer = downloadTransferBuffer, .offset = 0 } ); SDL_EndGPUCopyPass(copyPass); // Create and release buffer randomly to trigger a defrag int r = SDL_rand(32*2); if (r <= 32) { SDL_Log("%d: creating %d buffers", i_frame, r); for (int i = 0; i < r; ++i) { SDL_ReleaseGPUTransferBuffer(device, tbuffers[i]); tbuffers[i] = SDL_CreateGPUTransferBuffer( device, &(SDL_GPUTransferBufferCreateInfo) { .usage = SDL_GPU_TRANSFERBUFFERUSAGE_DOWNLOAD, .size = (1024*i_frame*r)%(33554432) } ); } } SDL_GPUFence* fence = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf); SDL_Delay(16); SDL_WaitForGPUFences(device, true, &fence, 1); SDL_ReleaseGPUFence(device, fence); // Compare the original bytes to the copied bytes Uint8 *downloadedData = SDL_MapGPUTransferBuffer( device, downloadTransferBuffer, false ); if (SDL_memcmp(downloadedData, bufferData, sizeof(bufferData)) == 0) { // SDL_Log("SUCCESS! Original buffer bytes and the downloaded bytes match!"); } else { SDL_Log("%d: FAILURE! Original buffer bytes do not match downloaded bytes!", i_frame); } SDL_UnmapGPUTransferBuffer(device, downloadTransferBuffer); } // Cleanup SDL_ReleaseGPUTransferBuffer(device, downloadTransferBuffer); SDL_ReleaseGPUTransferBuffer(device, uploadTransferBuffer); SDL_ReleaseGPUBuffer(device, OriginalBuffer); for (int i = 0; i < 32; ++i) { SDL_ReleaseGPUTransferBuffer(device, tbuffers[i]); } SDL_DestroyGPUDevice(device); SDL_Quit(); return 0; } ```

thatcosmonaut commented 1 week ago

This happens because the fence that is acquired doesn't apply to the defrag command buffer, which is a separate submission. The best possible fix is probably to insert the defrag commands into the initial command buffer submission instead of a separate submission. Initially I structured it this way so the render + present work could begin right away while the defrag commands were constructed but it probably won't make a huge difference and the results for downloads will actually be correct.

meyraud705 commented 1 week ago

The best possible fix is probably to insert the defrag commands into the initial command buffer submission instead of a separate submission.

I don't think that's going to work for continuous read back: After a read back, the fence is signalled after several frames. If a frame that come after submits a defrag, any call to map between the signal of the fence that did the read back and the actual execution of the defrag will return garbage.

thatcosmonaut commented 1 week ago

No, because the defrag process issues GPU-timelined copy commands to preserve data. The reason it's an issue now is because the defrag process creates a new internal resource and repoints the client handle to the new resource, so we need to make sure that the fence includes the copy commands or the new resource won't necessarily have the right data in it yet. Once this happens there won't be any way for the defrag process to be destructive.

meyraud705 commented 1 week ago

I also thought that would work at first but It does not. Here some pseudo code to make my previous message clearer:

SDL_GPUBuffer* b;
SDL_GPUTransferBuffer* t;

// Frame 0:
cmdbuf0 = SDL_AcquireGPUCommandBuffer();
SDL_DownloadFromGPUBuffer(b, t);
fence0 = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf0);

 // Frame1:
cmdbuf1 = SDL_AcquireGPUCommandBuffer();
/* Do some rendering */
fence1 = SDL_SubmitGPUCommandBufferAndAcquireFence(cmdbuf1); // Defrag happens here

SDL_WaitGPUFence(fence0); // download is in cmdbuf0 so we wait on fence0
void* m = SDL_MapGPUTransferBuffer(t); // Defrag did not happen yet on GPU but transfer buffer points to the new buffer with garbage data

libsdl-org / SDL

GPU: SDL_MapGPUTransferBuffer() return garbage data after memory defrag #11107