Closed fengyun527 closed 1 year ago
Do you have any other hardware and OS combinations that you can run the same tests on?
I suspect the problem is down to attempt to create a separate transfer queue, something that works well when I've tested it on my Kubuntu 22.04 Geforce 2060 system, on my non NVidia hardware combinations no transfer queue is created and it also works fine. Others are using Windows and Geforce successfully so it may be a driver/hardawre combination issue.
Could you run vsgviewer with the debug layer enabled?
vsgivewer models/lz.vsgt -d
The output might give us some further clues.
Also, the output of vulkaninfoSDK.exe from the VulkanSDK would be helpful to see exactly what queues are supported.
queueProperties[0]:
-------------------
minImageTransferGranularity = (1,1,1)
queueCount = 16
queueFlags = QUEUE_GRAPHICS_BIT | QUEUE_COMPUTE_BIT | QUEUE_TRANSFER_BIT | QUEUE_SPARSE_BINDING_BIT
timestampValidBits = 64
present support = true
queueProperties[1]:
-------------------
minImageTransferGranularity = (1,1,1)
queueCount = 1
queueFlags = QUEUE_TRANSFER_BIT
timestampValidBits = 64
present support = false
queueProperties[2]:
-------------------
minImageTransferGranularity = (1,1,1)
queueCount = 8
queueFlags = QUEUE_COMPUTE_BIT
timestampValidBits = 64
present support = true
Thanks for the info.
I am on a trip so can't look into the details till next week. When I am back at my dev system I swap in my Geforce 2060 and Geforce 1650 cards and see what they report. I expect them to work as I used them when I wrote the transfer queue code.
My hunch at this point is that the transfer queue is being selected even though it doesn't have support for graphics operations, and such operations are being done by the compile traversal.
Message ID: @.***>
@fengyun527 could you run an unmodified version of the VSG with the debug layer enabled to see if this can help determine the exact cause of the crash. This is important as I haven't seen the crash on my NVidia cards with Linux drivers, so the more info I have the I figure out the best solution.
A good test would be:
vsgviewer models/lz.vsgt -d
Where the models/lz.vsgt is from the vsgExamples/data directory.
I've run out of day today to investigate further, but tomorrow I'll install my Geforce 2060 and investigate when queues are being reported and if any debug errors are being reported. My aim is to resolve the issue and get this rolled into the coming VulkanSceneGraph-1.0.8 release that I'll make in the next day or so.
The command line window outputs: Warning: Device::getQueue(1, 0) failed to find any suitable Queue. Warning: Device::getQueue(1, 0) failed to find any suitable Queue.
when vsgviewer crush, there is a Exception Throw: read access violation. this was nullptr.
Call stack show the following lines:
vsgviewer.exe!vsg::Queue::queueFamilyIndex() Line 33
vsgviewer.exe!vsg::TransferTask::transferDynamicData() Line 599 vsgviewer.exe!vsg::RecordAndSubmitTask::finish(std::vector<vsg::ref_ptr
In the function vsg::TransferTask::transferDynamicData(), at Line 599, transferQueue is Null.
Thanks fhe details.
To help with debugging I have created a QueueSelection branch that has some additional debug output in Viewer::assignRecordAndSubmitTaskAndPresentation(..) and the output on my Kubuntu 22.04 / AMD 5700G system reports:
$ vsgviewer models/lz.vsgt
info: transferQueueFamily = 0
info: VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 1, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[1] queueFlags = COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 4, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
I will now drop in my Geforce 2060 and see what it reports and how it behaves.
I have just installed my Geforce 2060 and updated my NVidia driver to 535.54.03 and now I see the queue families reported as:
$ vsgviewer models/lz.vsgt
info: transferQueueFamily = 0
info: VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 16, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[1] queueFlags = TRANSFER | PARSE_BINDING, queueCount = 2, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[2] queueFlags = COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 8, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[3] queueFlags = TRANSFER | PARSE_BINDING | VIDEO_DECODE, queueCount = 1, timestampValidBits = 32, minImageTransferGranularity = {1, 1, 1}
Neat to see that VIDEO_DECODE is now supported with these latest drivers :-)
The transferQueueFamily of 0 that is selected from the device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT); needs investigating, as I think this should be 1.
I have figured out why my system queue 0 is being chosen by the uint32_t transferQueueFamily = device->getPhysicalDevice()->getQueueFamily(VK_QUEUE_TRANSFER_BIT); rather than 1 that @fengyun527 system is selecting.
On my system PARSE_BINDING is included in the transfer queue preventing the perfect match. If I force the selection of the transfer queue I get the following:
$ vsgviewer models/openstreetmap.vsgt -d
info: transferQueueFamily = 1
info: VkQueueFamilyProperties[0] queueFlags = GRAPHICS | COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 16, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[1] queueFlags = TRANSFER | PARSE_BINDING, queueCount = 2, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[2] queueFlags = COMPUTE | TRANSFER | PARSE_BINDING, queueCount = 8, timestampValidBits = 64, minImageTransferGranularity = {1, 1, 1}
info: VkQueueFamilyProperties[3] queueFlags = TRANSFER | PARSE_BINDING | VIDEO_DECODE, queueCount = 1, timestampValidBits = 32, minImageTransferGranularity = {1, 1, 1}
Warning: Device::getQueue(1, 0) failed to find any suitable Queue.
Warning: Device::getQueue(1, 0) failed to find any suitable Queue.
Segmentation fault (core dumped)
And a stack trace:
Thread 1 "vsgviewer" received signal SIGSEGV, Segmentation fault.
0x000055555570fb60 in vsg::Queue::queueFamilyIndex (this=0x0) at /home/robert/Dev/VulkanSceneGraph/include/vsg/vk/Queue.h:33
33 uint32_t queueFamilyIndex() const { return _queueFamilyIndex; }
(gdb) where
#0 0x000055555570fb60 in vsg::Queue::queueFamilyIndex (this=0x0) at /home/robert/Dev/VulkanSceneGraph/include/vsg/vk/Queue.h:33
#1 0x000055555570f5bb in vsg::TransferTask::transferDynamicData (this=0x7fffec156b78) at /home/robert/Dev/VulkanSceneGraph/src/vsg/app/TransferTask.cpp:599
#2 0x00005555557079a0 in vsg::RecordAndSubmitTask::finish (this=0x7fffec156780, recordedCommandBuffers=std::vector of length 1, capacity 1 = {...})
at /home/robert/Dev/VulkanSceneGraph/src/vsg/app/RecordAndSubmitTask.cpp:123
#3 0x00005555557076a0 in vsg::RecordAndSubmitTask::submit (this=0x7fffec156780, frameStamp=...) at /home/robert/Dev/VulkanSceneGraph/src/vsg/app/RecordAndSubmitTask.cpp:90
#4 0x00005555556c2151 in vsg::Viewer::recordAndSubmit (this=0x7fffec1539e0) at /home/robert/Dev/VulkanSceneGraph/src/vsg/app/Viewer.cpp:711
#5 0x0000555555653bce in main (argc=2, argv=0x7fffffffd328) at /home/robert/Dev/vsgExamples/examples/app/vsgviewer/vsgviewer.cpp:259
This looks to be the same crash as @fengyun527 has reported. I don't a fix yet, but could check in a workaround of using the graphics queue but look at fixing the two issues:
I feel like It's the same crush. I think this crush may occurs when a GPU has a queue only for TRANSFER which will make best match.
On Wed, 26 Jul 2023 at 15:48, fengyun527 @.***> wrote:
I feel like It's the same crush.
I think you mean "crash" :-)
Crush means " compress or squeeze forcefully https://www.google.com/search?client=ubuntu&hs=Nx4&channel=fs&sxsrf=AB5stBj_tCmgz5kkg0r0cvGht_ud08vjqg:1690383752397&q=forcefully&si=ACFMAn9-5A9OMKPWcg180I9o9MndjhfxAtC4uM181b4CgizUxqIDFc_sonl0ypfr9ov49XdotlLuGCwHe9wSruO1Cf6SHg5zBg%3D%3D&expnd=1 so as to break, damage, or distort https://www.google.com/search?client=ubuntu&hs=Nx4&channel=fs&sxsrf=AB5stBj_tCmgz5kkg0r0cvGht_ud08vjqg:1690383752397&q=distort&si=ACFMAn_otZSKbpzAqD_RvWk4YSL-Ah4yTbhT6xFpzzYLiB6gZ_uvPXfJnj07KZCbFVcet9tiZ-Bj8mKeJXHcwxFgE5RIO-OvCA%3D%3D&expnd=1 in shape."
I think this crush may occurs when a GPU has a queue only for TRANSFER which will make best match.
The problem is the Viewer code is it's checking the PhysicalDevice whether it a transfer queue and on finding that it does attempts to use it, but the logical vk/vsg::Device that has already been created when the Window was created doesn't have this queue created for it, so can't return it.
In Vulkan you can only set up the queues you require at the point you create the vkDevice, you can't do this after. It turns out the code in Viewer.cpp is wrong and really just worked by luck and you happened to get hardware/driver combination that revealed the error.
I'm currently rewriting the problem Viewer.cpp code so it checks the vk/vsg::Device for the queues that have been allocated rather than checking the PhysicalDevice capabilities that are a superset of what has been allocated.
I should complete the fix today.
Message ID: @.***>
I have refactored the way the Viewer sets up the queue used with the TransferDataTask to use the queues that have been allocated rather than the ones reported as supported by the vk/vsg::PhyscialDevice:
https://github.com/vsg-dev/VulkanSceneGraph/tree/QueueSelectionFixes
I am about to merge this branch with master and VulkanSceneGraph-1.0 branch, so it'll be part of the v1.0.8 that I'm about to make. First I need to wait on reports of the build passing on Windows and macOS.
I have merged the fixes with VSG master:
Describe the bug I'm studying vsg with my GPU Nvidia Quadro P1000 GPU, but I can't run any example, it crush
To Reproduce Steps to reproduce the behavior:
Desktop (please complete the following information):
Additional context I've found the reason of the bug. Nvidia Quadro P1000 has three QueueFamilies. their queueFlage are 0xf, 0x4, 0x2 Device can't get the "TransferQueue", then program crush. In funtion Viewer::assignRecordAndSubmitTaskAndPresentation(), 415 line in Viewer.cpp file,"transferQueueFamiliy" get 1 because of best match. 438 line in Viewer.cpp, queueFamilyIndex = 1,queueIndex = 0(default value) However, there are only two queues in the device's _queues at this point,The queueFamily Index and queueIndex of these two queues are 0,0 and 0,1. It leads to the result Device can't get the "TransferQueue" I changed transferQueueFamily to 0 and fixed the crash. But I don't think that should be the way to go