taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.49k stars 2.28k forks source link

SPIR-V OpUMulExtended does not function correctly with NVIDIA driver 470 and Vulkan version 1.2.175 #6303

Open lin-hitonami opened 2 years ago

lin-hitonami commented 2 years ago

Describe the bug Related PR: #6279 SPIR-V OpUMulExtended does not function correctly with NVIDIA driver 470 with Vulkan version 1.2.175 on CI. tests/python/test_overflow.py::test_mul_overflow[arch=vulkan-ty7-4294967296-4294967296] calculates ti.u64(4294967297) * ti.u64(4294967297) and test if it overflows. Currently I implement it by using the SPIR-V OpUMulExtended instruction and test if the high bits are zero. With NVIDIA driver 510 and Vulkan version 1.3.194, the high bits are equal to one, and it outputs the overflow warning, which is correct. However, with NVIDIA driver 470 with Vulkan version 1.2.175, the high bits are zero, so it doesn't output the overflow warning. See the log for more information.

To Reproduce

python3 tests/run_tests.py -vr2 -t8 -k "mul_overflow" -a vulkan -s

Log/Screenshots On 470:

tests/python/test_overflow.py::test_mul_overflow[arch=vulkan-ty7-4294967296-4294967296] [Taichi] Starting on arch=vulkan
[W 10/12/22 08:40:24.365 3101] [program.cpp:Program@146] Out-of-bound access checking is not supported on arch=vulkan
[W 10/12/22 08:40:24.365 3101] [vulkan_program.cpp:materialize_runtime@134] Enabling vulkan validation layer in debug mode
[W 10/12/22 08:40:24.373 3101] [vulkan_device_creator.cpp:vk_debug_callback@49] validation layer: 2, Validation Warning: [ VUID_Undefined ] Object 0: VK_NULL_HANDLE, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0x79de34d4 | Unrecognized CreateInstance->pCreateInfo->pApplicationInfo.apiVersion number (0x00403000). Assuming VK_API_VERSION_1_2.
[I 10/12/22 08:40:24.390 3101] [vulkan_device_creator.cpp:pick_physical_device@394] Found Vulkan Device 0 (NVIDIA GeForce RTX 2060)
[I 10/12/22 08:40:24.390 3101] [vulkan_device_creator.cpp:find_queue_families@148] Async compute queue 2, graphics queue 0
[I 10/12/22 08:40:24.390 3101] [vulkan_device_creator.cpp:find_queue_families@148] Async compute queue 2, graphics queue 0
[I 10/12/22 08:40:24.390 3101] [vulkan_device_creator.cpp:create_logical_device@462] Vulkan Device "NVIDIA GeForce RTX 2060" supports Vulkan 0 version 1.2.175
captured:  UMul: a: 4294967297 b: 4294967297 low: 8589934593 high: 0overflow: 0

On 510:

tests/python/test_overflow.py::test_mul_overflow[arch=vulkan-ty7-4294967296-4294967296] [Taichi] Starting on arch=vulkan
[W 10/12/22 07:41:46.667 3167] [program.cpp:Program@146] Out-of-bound access checking is not supported on arch=vulkan
[W 10/12/22 07:41:46.667 3167] [vulkan_program.cpp:materialize_runtime@134] Enabling vulkan validation layer in debug mode
[W 10/12/22 07:41:46.675 3167] [vulkan_device_creator.cpp:vk_debug_callback@49] validation layer: 2, Validation Warning: [ VUID_Undefined ] Object 0: VK_NULL_HANDLE, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0x79de34d4 | Unrecognized CreateInstance->pCreateInfo->pApplicationInfo.apiVersion number (0x00403000). Assuming VK_API_VERSION_1_2.
[I 10/12/22 07:41:46.677 3167] [vulkan_device_creator.cpp:pick_physical_device@394] Found Vulkan Device 0 (NVIDIA GeForce RTX 2060)
[I 10/12/22 07:41:46.677 3167] [vulkan_device_creator.cpp:pick_physical_device@394] Found Vulkan Device 1 (NVIDIA GeForce RTX 2060)
[I 10/12/22 07:41:46.677 3167] [vulkan_device_creator.cpp:find_queue_families@148] Async compute queue 2, graphics queue 0
[I 10/12/22 07:41:46.677 3167] [vulkan_device_creator.cpp:find_queue_families@148] Async compute queue 2, graphics queue 0
[I 10/12/22 07:41:46.677 3167] [vulkan_device_creator.cpp:create_logical_device@462] Vulkan Device "NVIDIA GeForce RTX 2060" supports Vulkan 0 version 1.3.194
captured:  UMul: a: 4294967297 b: 4294967297 low: 8589934593 high: 1overflow: 1Multiplication overflow detected in File "/home/dev/taichi/tests/python/test_overflow.py", line 201, in foo:
        return a * b
               ^^^^^
lin-hitonami commented 2 years ago

It seems that the 470 driver only supports vulkan api version 1.2.175 so 470 driver uses 1.2.175 even if vulkan 1.3 is installed.

bobcao3 commented 2 years ago

Driver bugs can not be easily fixed from our side. Maybe we should add a blacklist system based on or in addition to DeviceCapability system