vulkano-rs / vulkano

Safe and rich Rust wrapper around the Vulkan API
Apache License 2.0
4.45k stars 435 forks source link

Optimize sequential draws of the same pipeline #2539

Open Firestar99 opened 1 month ago

Firestar99 commented 1 month ago
  1. [x] Update documentation to reflect any user-facing changes - in this repository.

  2. [ ] Make sure that the changes are covered by unit-tests.

  3. [x] Run cargo clippy on the changes.

  4. [x] Run cargo +nightly fmt on the changes.

  5. [x] Please put changelog entries in the description of this Pull Request if knowledge of this change could be valuable to users. No need to put the entries to the changelog directly, they will be transferred to the changelog file by maintainers right after the Pull Request merge.

    Please remove any items from the template below that are not applicable.

  6. [x] Describe in common words what is the purpose of this change, related Github Issues, and highlight important implementation aspects.

Profiling based on my meshlet demo on sponza. Before at ~40fps and fully CPU limited: image

With this PR I get ~550fps and am most likely CPU GPU sync limited, due not having a working frame in flight system. image

And the best part is: It's absolutely free and still safe! (if I didn't mess up)

Current master evaluates all accessed resources (buffers, images) if you call any draw or dispatch immediately. My change defers the evaluation specifically of descriptor sets for their resources to the end of the cmd buffer recording. This allows me to deduplicate draws and dispatches using the same pipeline, merging their resources and descriptor sets together, and then deduplicate the descriptor sets again before evaluating each unique one for their actual resources. You may even be able to merge more than that, but I'd rather be on the conservative side with different pipelines.

Changelog:

### Additions
- Optimized performance of back to back draws/dispatches using the same pipeline significantly
Firestar99 commented 1 month ago

I'm seeing some weird performance behavior in bistro I would like to investigate first before officially submitting this Resolved: attached RenderDoc hates super large cmd buffers of 3000+ draws.

Firestar99 commented 1 month ago

I'm uncertain how secondary cmd buffers are handled. I could imagine a usecase, when they contain many draws with always changing pipelines (which is generally frowned upon anyways) and with very few buffers used, that could maybe be slower with this PR. But I dunno, that may need some testing.

Result: secondary cmd buffers have also improved significantly, 40fps to 120fps (meshlet on sponza), but it still is significantly better to just record everything into the primary cmd buffer, where I could reach 550fps.