This is the first patch on the road to a task graph, the replacement for the current synchronization. It includes the resource state tracker and the definition of a task. Some form of resource state tracker is at the heart of all task graph implementations I've seen (except Bevy's, as wgpu does synchronization internally) and what everything else should be built upon. Similarly, a task graph can't function without inversion of control: a task must have a callback that records commands into a command buffer and/or do host accesses. This is what the Task trait defines.

Documentation and testing is sparse at the moment because it's possible that things will change drastically over the coming days.

Design goals

Going forward I would like to establish some design goals for following work on the task graph. In order of importance:

Having something that works very well in as many use cases as possible.
Being as performant as possible.
Being safe.

The problems of the current synchronization

First of all it should be noted that vulkano's synchronization dates back to when Vulkan was in its infancy and no one knew how to best abstract the enormous amount of details that comes with an API as low level as this, so it's only natural that the current system has its problems. It's also, I believe, the last remaining piece of tech debt, at least in a public API, and why a rewrite is in order.

The current synchronization falls short on all of the above design goals. In my opinion, the single biggest factor in all its problems is that the synchronization is immediate-mode and just-in-time. To quote Hans-Kristian Arntzen in his "Render graphs and Vulkan — a deep dive" article:

Essentially, if what we want is just-in-time automatic sync, we basically want OpenGL/D3D11 again. Drivers have already been optimized to death for this, so why do we want to reimplement it in a half-assed way?

That sums it up very nicely. The current system is very hard to use correctly, and many common use cases are not possible to express at all, because GpuFutures must be chained in just the right way (because everything is JIT) and your usage of the API is validated rather than incorrect usage being ruled out by design. The error messages are a constant source of frustration and very hard to debug. When there are no errors, there are many instances of the synchronization working incorrectly and cauing validation errors instead, or plain data races, because GpuFuture was never safe either. And while it would be possible to fix some of these issues, it would still be a subpar system as summed up in the above quote. There are also many glaring issues in terms of performance both on the host and device, for instance because of the resource tracking that each command buffer and descriptor set do on each recording, all the clones and allocations going along with that, all the locking of resources, etc.

Enter the task graph

As mentioned, one of the fundamental building blocks of any task graph is a global state tracker and that the task has a callback to record commands. This means that:

The task graph compiler can order tasks such that none of them can cause conflicting resource accesses to begin with. This system is declarative and ahead-of-time, which makes it much easier to use correctly.
It's also much easier to make performant:
- The individual command buffers and descriptor sets don't need to keep any references or any other internal state, because there is one central resource state tracker that persists between submissions and the task graph compiler can insert pipeline barriers and semaphore wait/signal operations between the task executions.
- Since there is a global resource state tracker, there is no longer a need for the overly conservative global memory barriers at the start of a command buffer and other vulkano-isms that limit the performance on the device.
Adding the features that GpuFuture and AutoCommandBufferBuilder are missing is much easier as well:
- You can easily mix automatic and manual sync, because a task has an input/output interface that is defined ahead of time, and so doing manual pipeline barriers or executing other raw command buffers (cc #2222) inside a task is fine as long as the task has defined access to the affected resources (or no sync is needed, as with immutable shared resources).
- Cross-queue synchronization is easier to implement correctly, again because a task has defined its inputs/outputs, and the task graph compiler can insert queue family ownership transfers ahead of time.
Having a global resource state tracker, I believe it is imperative that we allow the user to manipulate it directly. This is mostly based on @john01dav's (very valid) criticisms that not enough of vulkano's internals can be manipulated. This allows the user to access a resource externally and update vulkano's resource tracker to then use the resource with the automatic sync.

Prior art

Changelog:

### Additions
- Added `memory::allocator::{align_down, align_up}`.

vulkano-rs / vulkano

Task graph [1/10]: resource synchronization state tracking #2540

Design goals

The problems of the current synchronization

Enter the task graph

Prior art