RFC: Plan for handling semaphores and synchronization between command buffers

tomaka commented 7 years ago

This RFC is just a base and is not actually a complete solution. I think it lays a good ground for proper synchronization handling, but the actual mechanisms haven't been designed yet.

On the command buffer side

Whenever a command is added to a command buffer and that command contains a buffer or an image, the variable that represents the buffer or the image is passed by value.

In other words for example in order to add a fill buffer command you'd call .fill_buffer(my_buffer, 0). Notice that we pass my_buffer by value, and not &my_buffer.

But that doesn't mean that we actually give up ownership of our buffer or image. What I call a buffer or an image can also be an Arc<MyBuf> for example. This means that you could call .fill_buffer(my_buffer.clone(), 0) and then continue to use my_buffer afterwards.

The same applies for descriptor sets and framebuffers.

At submission

Vulkano would have these traits:

unsafe trait SubmitPreparation {
    type Out: SubmitPreparationCommit;
    fn prepare(self, ...) -> Result<Self::Out, SomeErr>;     // actual parameters remain to be determined
}

unsafe trait SubmitPreparationCommit {
    fn commit(self);
}

In order to submit a command buffer that contains buffers or images, all the buffers or images must be of types whose mutable references implement SubmitPreparation. For example if the command buffer contains a single command that fills a buffer of type B, then in order to submit it the type &mut B must implement SubmitPreparation.

Before submission, the method prepare is called on all the mutable references to all the buffers and images. The parameters will most likely include the queue on which the command buffer is submitted. The prepare method is the last chance for a buffer or an image to perform verifications. The object that implements SubmitPreparationCommit is expected to keep the buffer and image locked if needed.

Once the submission succeeded, the method commit is called in order to unlock everything. If the lock is destroyed but commit wasn't called, the changes are instead reverted (in practice most of the time commit will apply the changes and the destructor won't do anything). This allows us to be able to recover from failed submissions (cc #351).

Alternatively command buffers that are known to be only submitted once can require buffers and images to directly implement SubmitPreparation, instead of their mutable references. It would be nice to be able to add impl SubmitPreparation for &mut T where T: SubmitPreparation to vulkano, so that buffers/images would only need to implement SubmitPreparation on themselves, but I don't think it's possible.

The SubmitPreparation trait will also be implemented on &mut Arc<T> where &T: SubmitPreparation. Therefore buffers and images are also encouraged to implement this trait on their shared references if possible.

On the buffer/image side

But how do buffers/images actually implement SubmitPreparation?

For the moment just like they already do, by keeping in memory information about last time they were used. Since this information is behind a mutex, the Out associated type will need to contain a MutexGuard that keeps the mutex locked and prevents other threads from simultaneously locking the same variable.

There are two ways this could deadlock:

If the buffer/image is used twice in the same command buffer. In theory this shouldn't happen because we have methods to detect duplicates.
If we try submit command buffer A that locks buffers X and Y, and at the same time try submit command buffer B that locks buffers Y and X. The thread of A could have X locked and the thread of B could have Y locked. To be honest I don't know how to solve that. Maybe only allow one submission at a time, or maybe return an error instead of blocking if a lock couldn't be acquired.

In the future it may be a good idea to add types similar to RefCell that allow custom synchronization strategies.

So how do you actually handle semaphores/barriers?

Whatever system we decide to plug in this design, it can be done by:

By extending the return type of SubmitPreparation::prepare. In addition to the lock, it may also return a list of pipeline barriers to apply or semaphores to wait upon.
By adding parameters to prepare that describe what we're doing.
By adding extension traits to SubmitPreparation. As an example we could add a method command_buffer.submit_after(&previous_submission) that would require all the buffers and images to implement SubmitPreparationAfter in order to turn themselves into SubmitPreparation.

tomaka commented 7 years ago

Usually when you talk about a "buffer" or an "image", people imagine long-lived objects that you keep alive and reuse between frames.

That is technically true, but not necessarily true in your code. For example you could create a pool of buffers, and at the start of each frame you ask the pool to give you a buffer. If none is available a new one is created. At the end of the frame the buffer is returned to the pool. While technically buffers of the pool are long-lived, in your code it looks like each buffer only lasts for one frame and is recreated every time.

By using this kind of design, we can avoid make ownership tracking easier.

For example if you do this:

let pool = BuffersPool::new(&device);

let cb1 = AutoCommandBuffer::new().fill_buffer(pool.alloc(), 0).build.submit();
let cb2 = AutoCommandBuffer::new().fill_buffer(pool.alloc(), 0).build.submit();
let cb3 = AutoCommandBuffer::new().fill_buffer(pool.alloc(), 0).build.submit();

Even though it looks like your code allocates three buffers, in reality it could end up being the same buffer used three times if your pool is able to determine that they can be used concurrently.

The choice should be left to the user whether to manage buffers and images like this or by passing the same buffer every time manually. But the system should allow both.

tomaka commented 7 years ago

So far I have talked about the "administrative" side of things. How the API should be designed. But before committing to a design, it should be decided what vulkano actually does at runtime at the lowest-level.

What can we guarantee exactly?

If we submit a command buffer A that writes to a resource, then submit a command buffer B to the same queue family that reads that same resource, and add a semaphore or a pipeline barrier between the two, then we know that we are safe without having to track anything at runtime.
Knowing if we are using the same queue family or the same buffer is probably not possible at compile-time though, and needs be checked at runtime.
Same if we submit two command buffers that write to a resource, or two command buffers that read from a resource.
It's not true for writes-after-reads though, because maybe some other queues are still reading the resource. In this case we must have runtime tracking.
The same rules apply if we wait for a fence after command buffer A before submitting command buffer B.
We can do the same across queue families, but resources in exclusive sharing mode need a pipeline barrier for their queue family switch (resources in shared sharing mode are expected to be checked during the construction of the CB).
The biggest challenge is that it is also legal to have a chain of dependencies. For example if command buffers A and C write a resource, it is legal to have C wait for B and B wait for A (since C will indirectly wait for A).
Special case: swapchain images must have a pipeline barrier before and after they are used.

Guaranteeing some things at compile-time looks possible, but you will always need runtime checks for some usages.

Next to this, we also have two possibilities to handle synchronization:

Either the user has to make semaphores and pipeline barriers explicit.
Or vulkano automatically builds them.

I think the right way to do is to automatically build things, except when the CPU or GPU overhead would be too large. For example nobody wants to explicitly write out pipeline barriers for the swapchain images at the end of a frame, but you also don't want vulkano to potentially block your queue if it thinks that the only way to guarantee safety is to block your queue.

The biggest problem is semaphores. Whenever you submit a command buffer you can signal semaphores. Later if you want to depend on the command buffer that you submitted, you have to wait on that semaphore. This means that when you submit a command buffer you have to know in advance how many command buffers are later going to depend on it. This is not something that vulkano can know, unless we build in some sort of graph system.

The other problem is resources in exclusive mode. If you use a resource in exclusive mode in queue family A, then later in queue family B, you have to put a pipeline barrier in both queue families A and B. This means that at the moment when we know that we need this pipeline barrier, it is likely that we have already submitted other stuff to queue family A and thus it is really suboptimal to append a command at the end of it just because vulkano wasn't capable of determining it.

I think these two aspects should be explicitly performed by the user, while other aspects (for example pipeline barriers between two command buffers of the same queue) should be done automatically by vulkano.

tomaka commented 7 years ago

We have essentially three aspects to explore:

Have a graph system that buffers command buffers and automatically builds semaphores between command buffers.
Have a way to access resources accessed by command buffer A by explicitly telling vulkano to wait on a semaphore signaled by A.
Track things at runtime.

I think option B could be done by having "marker objects" that you can build from a Submission object. For example:

let my_buffer: MyBuffer = new_buffer();
let cb = AutoCommandBuffer::new().fill_buffer(my_buffer /* takes ownership of the buffer */, 0).build().submit();

let my_buffer2: Marker<MyBuffer> = SubmitAfter(cb).0.buffer();    /* buffer of the command 0, totally experimental syntax */
let cb2 = AutoCommandBuffer::new().fill_buffer(my_buffer2, 0).build().submit();

This wouldn't need any runtime tracking in my_buffer because we could determine that this usage is always valid. When submitting cb2 the Marker object would indicate vulkano that a semaphore is needed with cb.

Keep in mind that this is just an idea and I have probably overlooked tons of things.

tomaka commented 7 years ago

Option A (the graph system) is exactly what professional game engines are doing.

Basically you'd put command buffers in a graph (a struct), and submit the whole graph at once.

I see two challenges with this:

They need to somehow integrate with non-graph-using code. Even if you put everything in a graph, you still need some way to join two graphs.
Can they be implemented transparently with the above API?

tomaka commented 7 years ago

I tried implementing the schema of the opening post. Unfortunately for some technical reasons we need to be able to clone the resources passed to commands. Therefore implementing the trait on &mut T doesn't make sense if the T is clonable cheaply.

I think the trait should simply be implemented on &T.

tomaka commented 7 years ago

Better proposal: https://github.com/tomaka/vulkano/issues/385

vulkano-rs / vulkano