Closed Julusian closed 5 years ago
Recommend closing or doing a PR on the docs to reference the original source of the assumption.
https://vulkan-tutorial.com/Vertex_buffers/Staging_buffer#page_Transfer_queue
The good news is that any queue family with VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT capabilities already implicitly support VK_QUEUE_TRANSFER_BIT operations. The implementation is not required to explicitly list it in queueFlags in those cases.
Vulkano docs related to the assumption that graphics or compute implies transfer: https://docs.rs/vulkano/0.11.1/vulkano/instance/struct.QueueFamily.html#method.supports_transfers
Note: Queues that support graphics or compute operations also always support transfer operations. As of writing this, this function will always return true. The purpose of this function is to be future-proofed in case queues that don't support transfer operations are ever added to Vulkan.
Related source https://github.com/vulkano-rs/vulkano/blob/d8fbef1a20465b67d53b8f329f0828eadb18fb2f/vulkano/src/instance/instance.rs#L1081-L1085
Blame doesn't reference the docs that back up the assumption. https://github.com/vulkano-rs/vulkano/commit/8574583409cddc201cfb995d4f48f995060482d4
Reproduced on GeForce GTX 1060: https://vulkan.gpuinfo.org/displayreport.php?id=5227#queuefamilies
Test code
#[test]
fn test_queue_bits() {
let instance = Instance::new(None, &vulkano_win::required_extensions(), None).unwrap();
let physical = PhysicalDevice::enumerate(&instance).next().unwrap();
for (i, q) in physical.queue_families().enumerate() {
println!(
"#{}: count:{}, g:{}, c:{}, t:{}",
i,
q.queues_count(),
q.supports_graphics(),
q.supports_compute(),
q.supports_transfers(),
);
}
}
cargo test test_queue_bits -- --nocapture
running 1 test
#0: count:16, g:true, c:true, t:true
#1: count:1, g:false, c:false, t:true
#2: count:8, g:false, c:true, t:true
test enwin::tests::test_queue_bits ... ok
I was actually going to look up how to create a transfer operation and try to fail it on the compute queue but immediately ran into the Vulkan docs saying that the capability was implied by compute / graphics.
I did some more digging, and found the following.
https://www.khronos.org/assets/uploads/developers/library/2018-vulkanised/03-Steven-Tovey-VulkanMemoryManagement_Vulkanised2018.pdf slide 34-37
https://gpuopen.com/concurrent-execution-asynchronous-queues/ Copy Queue(DirectX 12) / Transfer Queue (Vulkan): DMA transfers of data over the PCIe bus
These all state that the queue with the transfer bit is what you should be using for and transfers between host and device. It does also say that the queue used should really depend on a number of factors, so this is a decision that should be left up to the user to decide on a per transfer basis.
As it stands supports_transfers()
is useless in any decision making on this. As a workaround I am having to do supports_transfers() && !supports_compute() && !supports_graphics()
to figure out the true transfer queue, and then handling if that doesn't match anything (Intel GPU's only have a single queue family).
It could be that the solution for this is not to change the behaviour of supports_transfers()
, but to instead add another method supports_dma_transfers()
which returns the true value of the flags?
I read similar indications. It does seem true on Nvidia devices that the transfer-only queue family has a special relationship with the DMA module.
In short, I support making all three functions report only their correct bit and changing the docs so that supports_graphics
and supports_compute
repeat the claims in the Vulkan docs instead of the current situation, hacking the supports_transfers
function.
As for your individual problem, you probably still need to filter the families even with the change because Nvidia devices' graphics queues do appear universal and any other universal or multi-purpose queues will likely not be the one with tighter relationship to the DMA.
I've made my own GpuPicker
to find a discrete graphics device and avoid any built-in graphics. Finding queues and devices is pretty common busy work. A higher level module could be useful. If you open up a PR for fixing the transfer queue / docs, I'll put out a proposal API.
The several buckets of trivia to uncover:
supports_transfers
or the vk::QUEUE_TRANSFER_BIT
being used internally anywhere to gate any commands. I don't think we would incorrectly lock out devices from handling certain commands.supports_graphics
no longer is coerced to imply transfers? No because of 2 and the only way they would be denied functionality is if they are incorrectly using supports_transfers
to find graphics queue families. Such a hack would be very unlikely to have developed. @Julusian I suppose since you're wanting the DMA dedicated queue, you're aware of either sharing or transferring ownership of resources across queue families.
I ran into the issue last night and am sorting out both what Vulkano already supports and how to do it in Vulkan to see if our API is missing anything.
vkSharingMode
is something that attracted my attention. Other documents are suggesting transferring ownership but I haven't found the right API vocabulary for it yet to see if it needs to be implemented in Vulkano or not.
https://docs.rs/vulkano/0.11.1/vulkano/sync/enum.SharingMode.html
QueueFamily arguments pull off the ownership transfer, and it looks like we support them. https://docs.rs/vulkano/0.11.1/vulkano/command_buffer/sys/struct.UnsafeCommandBufferBuilderPipelineBarrier.html#method.add_image_memory_barrier
So.. looks like the only thing still needing work is to change all the supports_*
calls to tell the truth
Whoops, I have had very little time to play with vulkan in rust so completely forgot to respond to this. Thanks for doing a PR.
As for your individual problem, you probably still need to filter the families even with the change because Nvidia devices' graphics queues do appear universal and any other universal or multi-purpose queues will likely not be the one with tighter relationship to the DMA.
That is true, but now my check can be more accurate in finding the DMA queue, rather than assuming it to be the one without graphics or compute capabilities.
I suppose since you're wanting the DMA dedicated queue, you're aware of either sharing or transferring ownership of resources across queue families.
I am vaguely aware of it but only enough to set everything to be owned by all the queues I use. Optimising performance in my vulkan code is not something I have though about much at all yet.
Template
main.rs
: main.zipIssue
The supports_transfers check is ignoring the flags and always returning true. If https://www.reddit.com/r/vulkan/comments/4x0lbu/nvidias_lone_transfer_queue/ is accurate (I am struggling to find anything else arguing either way) then being able to identify the DMA transfer queue will be crucial to decent performance when lots of transfers are needed.
Introduced by https://github.com/vulkano-rs/vulkano/pull/536
My simple test program is producing:
for the GPU: https://vulkan.gpuinfo.org/displayreport.php?id=5364#queuefamilies