taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.49k stars 2.28k forks source link

RhiResult(-3) error when running NGP example on Windows with Vulkan #7674

Open arrrmin opened 1 year ago

arrrmin commented 1 year ago

Describe the bug I get a Dispatch error : RhiResult(-3) when trying to run taichi_ngp.py on Windows using Vulkan.

To Reproduce Tried running the script taichi_ngp.py on Windows 11 with python 3.10.9 and taichi 1.5.0. CPU: Ryzen 9 7900X GPU: AMD 7900XT

Log/Screenshots

(taichi) C:\Users\armin\projects\external\taichi>C:/Users/armin/Programs/miniconda3/envs/taichi/python.exe c:/Users/armin/projects/external/taichi/python/taichi/examples/rendering/taichi_ngp.py
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[W 03/28/23 14:37:12.283 28600] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[Taichi] Starting on arch=vulkan
Loading model from ./npy_models/lego.npy
[E 03/28/23 14:37:12.541 28600] [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)

Traceback (most recent call last):
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1111, in <module>
    main(cmd_args)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 1081, in main
    ngp.load_model(npy_file)
  File "c:\Users\armin\projects\external\taichi\python\taichi\examples\rendering\taichi_ngp.py", line 332, in load_model
    self.hash_embedding.from_numpy(
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 357, in from_numpy
    self._from_external_arr(arr)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 311, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\field.py", line 347, in _from_external_arr
    ext_arr_to_tensor(arr, self)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 1023, in wrapped
    return primal(*args, **kwargs)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 950, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 853, in func__
    raise e from None
  File "C:\Users\armin\Programs\miniconda3\envs\taichi\lib\site-packages\taichi\lang\kernel_impl.py", line 850, in func__
    t_kernel(launch_ctx)
RuntimeError: [runtime.cpp:taichi::lang::gfx::GfxRuntime::launch_kernel@576] Dispatch error : RhiResult(-3)

Additional comments

(taichi) C:\Users\armin\projects\external\taichi>ti diagnose
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

Taichi system diagnose:

python: 3.10.9 | packaged by conda-forge | (main, Jan 11 2023, 15:15:40) [MSC v.1916 64 bit (AMD64)]
system: win32
executable: C:\Users\armin\Programs\miniconda3\envs\taichi\python.exe
platform: Windows-10-10.0.22621-SP0
architecture: 64bit WindowsPE
uname: uname_result(system='Windows', node='Armin-PC', release='10', version='10.0.22621', machine='AMD64')
locale: en_AT.cp1252
PATH: C:\Users\armin\Programs\miniconda3\envs\taichi;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\mingw-w64\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\usr\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Library\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Scripts;C:\Users\armin\Programs\miniconda3\envs\taichi\bin;C:\Users\armin\Programs\miniconda3\condabin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files\Git\cmd;C:\Users\armin\AppData\Local\Microsoft\WindowsApps;C:\Users\armin\AppData\Local\gitkraken\bin;C:\Users\armin\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\armin\Programs\miniconda3;C:\Users\armin\Programs\miniconda3\Scripts;C:\Users\armin\Programs\miniconda3\Library\bin;.;C:\Users\armin\AppData\Local\Programs\Julia-1.8.5\bin;C:\Users\armin\Programs\miniconda3\envs\taichi\Lib\site-packages\taichi\_lib
PYTHONPATH: ['C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\Scripts\\ti.exe', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\python310.zip', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\DLLs', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi', 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages']

`lsb_release` not available: [WinError 2] The system cannot find the file specified

import: <module 'taichi' from 'C:\\Users\\armin\\Programs\\miniconda3\\envs\\taichi\\lib\\site-packages\\taichi\\__init__.py'>

cc: False
cpu: True
metal: False
opengl: True
[W 03/28/23 14:28:12.986 29280] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
cuda: False
vulkan: True

`glewinfo` not available: [WinError 2] The system cannot find the file specified

`nvidia-smi` not available: [WinError 2] The system cannot find the file specified
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=opengl

[W 03/28/23 14:28:15.777 32208] [cuda_driver.cpp:taichi::lang::CUDADriverBase::load_lib@36] nvcuda.dll lib not found.
[W 03/28/23 14:28:15.778 32208] [misc.py:adaptive_arch_select@772] Arch=[<Arch.cuda: 5>] is not supported, falling back to CPU
[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9
[Taichi] Starting on arch=x64

[Taichi] version 1.5.0, llvm 15.0.1, commit 7b885c28, win, python 3.10.9

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum:  https://forum.taichi.graphics/

                                TAICHI EXAMPLES
+-----------------------------------------------------------------------------+
| 0: ad_gravity           | 25: laplace             | 50: physarum            |
| 1: circle_packing_image | 26: laplace_equation    | 51:                     |
|                         |                         | poisson_disk_sampling   |
| 2: comet                | 27: mandelbrot_zoom     | 52: print_offset        |
| 3: cornell_box          | 28: marching_squares    | 53: rasterizer          |
| 4: diff_sph             | 29: mass_spring_3d_ggui | 54: regression          |
| 5: euler                | 30: mass_spring_game    | 55: sdf_renderer        |
| 6: explicit_activation  | 31:                     | 56: simple_derivative   |
|                         | mass_spring_game_ggui   |                         |
| 7: export_mesh          | 32: mciso_advanced      | 57: simple_texture      |
| 8: export_ply           | 33: mgpcg               | 58: simple_uv           |
| 9: export_videos        | 34: mgpcg_advanced      | 59: snow_phaseField     |
| 10: fem128              | 35: minimal             | 60: stable_fluid        |
| 11: fem128_ggui         | 36: minimization        | 61: stable_fluid_ggui   |
| 12: fem99               | 37: mpm128              | 62: stable_fluid_graph  |
| 13: fractal             | 38: mpm128_ggui         | 63: taichi_bitmasked    |
| 14: fractal3d_ggui      | 39: mpm3d               | 64: taichi_dynamic      |
| 15: fullscreen          | 40: mpm3d_ggui          | 65: taichi_logo         |
| 16: game_of_life        | 41: mpm88               | 66: taichi_ngp          |
| 17: gui_image_io        | 42: mpm88_graph         | 67: taichi_sparse       |
| 18: gui_widgets         | 43: mpm99               | 68: texture_graph       |
| 19: implicit_fem        | 44:                     | 69: tutorial            |
|                         | mpm_lagrangian_forces   |                         |
| 20:                     | 45: nbody               | 70:                     |
| implicit_mass_spring    |                         | two_stream_instability  |
| 21:                     | 46: odop_solar          | 71: vortex_rings        |
| initial_value_problem   |                         |                         |
| 22: jacobian            | 47: oit_renderer        | 72: waterwave           |
| 23:                     | 48: patterns            |                         |
| karman_vortex_street    |                         |                         |
| 24: keyboard            | 49: pbf2d               |                         |
+-----------------------------------------------------------------------------+
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.22s

Consider attaching this log when maintainers ask about system information.
>>> Running time: 6.75s
bobcao3 commented 1 year ago

-3 means error not supported, so we accidentally used a unsupported / not activated feature I think...

We will look into it

bobcao3 commented 1 year ago

We have located the bug, it appears to be an issue with the maxComputeWorkGroupCount values on AMD GPUs. We should be able to fix this quickly

chenzhekl commented 1 year ago

Any updates for this issue? This seems to have affected Intel cards as well.

chenzhekl commented 1 year ago

Here is a minimal script to reproduce the issue:

import numpy as np
import taichi as ti

ti.init(arch=ti.gpu)

np_arr = np.ones((100000000,), dtype=np.float32)
ti_field = ti.field(dtype=ti.f32, shape=(np_arr.shape[0],))

@ti.kernel
def run(dst: ti.template(), src: ti.types.ndarray()):
    for I in dst:
        dst[I] = src[I]

run(ti_field, np_arr)

Once the length of np_arr exceeds a point, we get RhiResult(-3) for the vulkan backend.

If this issue is low priority on your list, could you please advise me how I may contribute a PR for the issue? @bobcao3

Thanks

dme49 commented 4 months ago

No PR, sorry, but if somebody wants to pick this up (e.g. as part of PR #7333), the following patch fixes this for me (linux, mesa radv vulkan, amd). The problem is in the const-range case of spir-v range-for codegen, which can currently ask for an unbounded number of workgroups. This patch just applies a fixed cap that matches the dynamic case. I'm assuming the performance implications don't matter, but you may know better.

diff --git a/taichi/codegen/spirv/spirv_codegen.cpp b/taichi/codegen/spirv/spirv_codegen.cpp
index e1e1124fd..b9167f6e0 100644
--- a/taichi/codegen/spirv/spirv_codegen.cpp
+++ b/taichi/codegen/spirv/spirv_codegen.cpp
@@ -2000,7 +2000,17 @@ class TaskCodegen : public IRVisitor {
           ir_->i32_type(), stmt->begin_value, false);  // Named Constant
       total_elems = ir_->int_immediate_number(ir_->i32_type(), num_elems,
                                               false);  // Named Constant
-      task_attribs_.advisory_total_num_threads = num_elems;
+      // To avoid exceeding device limits, we must cap total_num_threads so
+      // that the eventual num_workgroups = total_num_threads/block_dim is in
+      // range.  Use the same kMaxNumThreadsGridStrideLoop cap as the dynamic
+      // case; that's probably a bit conservative for typical (num_elems,
+      // block_dim) combinations, but there's little to be gained by adapting
+      // to the actual limit even if that were readily available.
+      task_attribs_.advisory_total_num_threads = std::min(
+          kMaxNumThreadsGridStrideLoop, num_elems);
+      TI_DEBUG("num_elems={} block_dim={} -> advisory_total_num_threads={}",
+               num_elems, stmt->block_dim,
+               task_attribs_.advisory_total_num_threads);
     } else {
       spirv::Value end_expr_value;
       if (stmt->end_stmt) {