vulkano-rs / vulkano

Safe and rich Rust wrapper around the Vulkan API
Apache License 2.0
4.53k stars 436 forks source link

supports_transfers always returns true #1174

Closed Julusian closed 5 years ago

Julusian commented 5 years ago

Template

Issue

The supports_transfers check is ignoring the flags and always returning true. If https://www.reddit.com/r/vulkan/comments/4x0lbu/nvidias_lone_transfer_queue/ is accurate (I am struggling to find anything else arguing either way) then being able to identify the DMA transfer queue will be crucial to decent performance when lots of transfers are needed.

Introduced by https://github.com/vulkano-rs/vulkano/pull/536

My simple test program is producing:

#0: g:true c:true, t:true
#1: g:false c:false, t:true
#2: g:false c:true, t:true

for the GPU: https://vulkan.gpuinfo.org/displayreport.php?id=5364#queuefamilies

knappador commented 5 years ago

Recommend closing or doing a PR on the docs to reference the original source of the assumption.

https://vulkan-tutorial.com/Vertex_buffers/Staging_buffer#page_Transfer_queue

The good news is that any queue family with VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT capabilities already implicitly support VK_QUEUE_TRANSFER_BIT operations. The implementation is not required to explicitly list it in queueFlags in those cases.

Vulkano docs related to the assumption that graphics or compute implies transfer: https://docs.rs/vulkano/0.11.1/vulkano/instance/struct.QueueFamily.html#method.supports_transfers

Note: Queues that support graphics or compute operations also always support transfer operations. As of writing this, this function will always return true. The purpose of this function is to be future-proofed in case queues that don't support transfer operations are ever added to Vulkan.

Related source https://github.com/vulkano-rs/vulkano/blob/d8fbef1a20465b67d53b8f329f0828eadb18fb2f/vulkano/src/instance/instance.rs#L1081-L1085

Blame doesn't reference the docs that back up the assumption. https://github.com/vulkano-rs/vulkano/commit/8574583409cddc201cfb995d4f48f995060482d4

Reproduced on GeForce GTX 1060: https://vulkan.gpuinfo.org/displayreport.php?id=5227#queuefamilies

Test code

    #[test]
    fn test_queue_bits() {
        let instance = Instance::new(None, &vulkano_win::required_extensions(), None).unwrap();
        let physical = PhysicalDevice::enumerate(&instance).next().unwrap();

        for (i, q) in physical.queue_families().enumerate() {
            println!(
                "#{}: count:{}, g:{}, c:{}, t:{}",
                i,
                q.queues_count(),
                q.supports_graphics(),
                q.supports_compute(),
                q.supports_transfers(),
            );
        }
    }
cargo test test_queue_bits -- --nocapture
running 1 test
#0: count:16, g:true, c:true, t:true
#1: count:1, g:false, c:false, t:true
#2: count:8, g:false, c:true, t:true
test enwin::tests::test_queue_bits ... ok

I was actually going to look up how to create a transfer operation and try to fail it on the compute queue but immediately ran into the Vulkan docs saying that the capability was implied by compute / graphics.

Julusian commented 5 years ago

I did some more digging, and found the following. https://www.khronos.org/assets/uploads/developers/library/2018-vulkanised/03-Steven-Tovey-VulkanMemoryManagement_Vulkanised2018.pdf slide 34-37 https://gpuopen.com/concurrent-execution-asynchronous-queues/ Copy Queue(DirectX 12) / Transfer Queue (Vulkan): DMA transfers of data over the PCIe bus

These all state that the queue with the transfer bit is what you should be using for and transfers between host and device. It does also say that the queue used should really depend on a number of factors, so this is a decision that should be left up to the user to decide on a per transfer basis.

As it stands supports_transfers() is useless in any decision making on this. As a workaround I am having to do supports_transfers() && !supports_compute() && !supports_graphics() to figure out the true transfer queue, and then handling if that doesn't match anything (Intel GPU's only have a single queue family).

It could be that the solution for this is not to change the behaviour of supports_transfers(), but to instead add another method supports_dma_transfers() which returns the true value of the flags?

knappador commented 5 years ago

I read similar indications. It does seem true on Nvidia devices that the transfer-only queue family has a special relationship with the DMA module.

In short, I support making all three functions report only their correct bit and changing the docs so that supports_graphics and supports_compute repeat the claims in the Vulkan docs instead of the current situation, hacking the supports_transfers function.

As for your individual problem, you probably still need to filter the families even with the change because Nvidia devices' graphics queues do appear universal and any other universal or multi-purpose queues will likely not be the one with tighter relationship to the DMA.

I've made my own GpuPicker to find a discrete graphics device and avoid any built-in graphics. Finding queues and devices is pretty common busy work. A higher level module could be useful. If you open up a PR for fixing the transfer queue / docs, I'll put out a proposal API.

The several buckets of trivia to uncover:

  1. Are there queue families that don't report transfer capability but actually have it and would be precluded from performing transfers if we don't assume it in code? I believe the answer is no because I don't find the supports_transfers or the vk::QUEUE_TRANSFER_BIT being used internally anywhere to gate any commands. I don't think we would incorrectly lock out devices from handling certain commands.
  2. Would true graphics-only and compute-only queues be harmed by the removal? No because they should be failing the operations in the current configuration, and the Vulkan docs say we should assume the capability regardless of the reporting.
  3. Would users who trusted the current documentation be given an unwelcome surprise when supports_graphics no longer is coerced to imply transfers? No because of 2 and the only way they would be denied functionality is if they are incorrectly using supports_transfers to find graphics queue families. Such a hack would be very unlikely to have developed.
  4. Do we need to touch sparse queues at all? No. They report only the sparse bit.
knappador commented 5 years ago

@Julusian I suppose since you're wanting the DMA dedicated queue, you're aware of either sharing or transferring ownership of resources across queue families.

I ran into the issue last night and am sorting out both what Vulkano already supports and how to do it in Vulkan to see if our API is missing anything.

vkSharingMode is something that attracted my attention. Other documents are suggesting transferring ownership but I haven't found the right API vocabulary for it yet to see if it needs to be implemented in Vulkano or not. https://docs.rs/vulkano/0.11.1/vulkano/sync/enum.SharingMode.html

knappador commented 5 years ago

VkBufferMemoryBarrier

QueueFamily arguments pull off the ownership transfer, and it looks like we support them. https://docs.rs/vulkano/0.11.1/vulkano/command_buffer/sys/struct.UnsafeCommandBufferBuilderPipelineBarrier.html#method.add_image_memory_barrier

So.. looks like the only thing still needing work is to change all the supports_* calls to tell the truth

Julusian commented 5 years ago

Whoops, I have had very little time to play with vulkan in rust so completely forgot to respond to this. Thanks for doing a PR.

As for your individual problem, you probably still need to filter the families even with the change because Nvidia devices' graphics queues do appear universal and any other universal or multi-purpose queues will likely not be the one with tighter relationship to the DMA.

That is true, but now my check can be more accurate in finding the DMA queue, rather than assuming it to be the one without graphics or compute capabilities.

I suppose since you're wanting the DMA dedicated queue, you're aware of either sharing or transferring ownership of resources across queue families.

I am vaguely aware of it but only enough to set everything to be owned by all the queues I use. Optimising performance in my vulkan code is not something I have though about much at all yet.