thatcosmonaut / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
1 stars 2 forks source link

Timing Queries Feature Request #80

Open Aeva opened 1 week ago

Aeva commented 1 week ago

Hello!

I would like to request that support for timing queries eventually be added to SDL3 GPU. I imagine this is a feature that is primarily of interest to advanced users, but I am one such user.

In my own projects I regularly use timing queries for profiling different parts of the frame on the GPU timeline without using advanced instrumentation. This is especially useful when developing on Linux. I've used this both for simple frame stats, but I've also used this to create heat map visualizations to identify help identify expensive draw calls.

I am also interested in using timing queries for dynamic workload scaling. In my case I would be using it to scale the number of points drawn by my point cloud renderer, but conventional renderers sometimes use timing queries to guide dynamic resolution scaling.

I think it would be fine for the use cases I've outlined if this ends up having weak(ish) guarantees on the exact timing characteristics as different graphics API backends may have different limitations.

I think if this feature were to be implemented it is plausible that these uses and others might become more common for intermediate complexity rendering projects.

I do not have a proposal in hand for the API right now, but I see this as being essentially a form of async readback, so I imagine the API would resemble that of an async readback system. Also most idiomatic usage of timing queries involves some degree of N-buffering as multiple frames worth of queries may be pending at once, and so it should be relatively easy for the user to implement such a buffering scheme assuming it doesn't make sense for the API to abstract that.

flibitijibibo commented 1 week ago

Unlike occlusion queries this wouldn't have the same baggage of the old begin/end systems, so this might be more realistic to do. Just depends on how differently Metal/D3D12/Vulkan do it!

thatcosmonaut commented 1 week ago

Did a brief investigation into this...

Vulkan has vkCmdWriteTimestamp which marks a timestamp in the command buffer. vkGetQueryPoolResults can be called after the command buffer is done executing. The difference between the timestamp values gives you the timing result. You can also use vkCmdCopyQueryPoolResults inside the command buffer to copy the results to a VkBuffer object.

D3D12 has BeginQuery, EndQuery, and ResolveQueryData. Seems to be similar to Vulkan, but you have to call ResolveQueryData in the command buffer.

ResolveQueryData performs a batched operation that writes query data into a destination buffer.

This might mean that our Vulkan implementation will have to use vkCmdCopyQueryPoolResults.

D3D11 has ID3D11DeviceContext::Begin, ID3D11DeviceContext::End , and ID3D11DeviceContext::GetData with the D3D11_QUERY_TIMESTAMP enum. We actually already use these functions for our fence implementation so there's a reasonable expectation that this will work.

Metal is the most confusing... There's a gpuEndTime and gpuStartTime on the command buffer, but that only gives you the execution time of the entire command buffer and is not useful here. MoltenVK does seem to support timestamps so we should probably look at how they implemented it.

In all cases we have to wait for the command buffer to finish, so as mentioned in the request this is going to be very similar to async readback where the user has to request a fence and wait for it to be signaled before reading the data.