Open arrrmin opened 1 year ago
-3 means error not supported, so we accidentally used a unsupported / not activated feature I think...
We will look into it
We have located the bug, it appears to be an issue with the maxComputeWorkGroupCount
values on AMD GPUs. We should be able to fix this quickly
Any updates for this issue? This seems to have affected Intel cards as well.
Here is a minimal script to reproduce the issue:
import numpy as np
import taichi as ti
ti.init(arch=ti.gpu)
np_arr = np.ones((100000000,), dtype=np.float32)
ti_field = ti.field(dtype=ti.f32, shape=(np_arr.shape[0],))
@ti.kernel
def run(dst: ti.template(), src: ti.types.ndarray()):
for I in dst:
dst[I] = src[I]
run(ti_field, np_arr)
Once the length of np_arr
exceeds a point, we get RhiResult(-3)
for the vulkan backend.
If this issue is low priority on your list, could you please advise me how I may contribute a PR for the issue? @bobcao3
Thanks
No PR, sorry, but if somebody wants to pick this up (e.g. as part of PR #7333), the following patch fixes this for me (linux, mesa radv vulkan, amd). The problem is in the const-range case of spir-v range-for codegen, which can currently ask for an unbounded number of workgroups. This patch just applies a fixed cap that matches the dynamic case. I'm assuming the performance implications don't matter, but you may know better.
diff --git a/taichi/codegen/spirv/spirv_codegen.cpp b/taichi/codegen/spirv/spirv_codegen.cpp
index e1e1124fd..b9167f6e0 100644
--- a/taichi/codegen/spirv/spirv_codegen.cpp
+++ b/taichi/codegen/spirv/spirv_codegen.cpp
@@ -2000,7 +2000,17 @@ class TaskCodegen : public IRVisitor {
ir_->i32_type(), stmt->begin_value, false); // Named Constant
total_elems = ir_->int_immediate_number(ir_->i32_type(), num_elems,
false); // Named Constant
- task_attribs_.advisory_total_num_threads = num_elems;
+ // To avoid exceeding device limits, we must cap total_num_threads so
+ // that the eventual num_workgroups = total_num_threads/block_dim is in
+ // range. Use the same kMaxNumThreadsGridStrideLoop cap as the dynamic
+ // case; that's probably a bit conservative for typical (num_elems,
+ // block_dim) combinations, but there's little to be gained by adapting
+ // to the actual limit even if that were readily available.
+ task_attribs_.advisory_total_num_threads = std::min(
+ kMaxNumThreadsGridStrideLoop, num_elems);
+ TI_DEBUG("num_elems={} block_dim={} -> advisory_total_num_threads={}",
+ num_elems, stmt->block_dim,
+ task_attribs_.advisory_total_num_threads);
} else {
spirv::Value end_expr_value;
if (stmt->end_stmt) {
Describe the bug I get a
Dispatch error : RhiResult(-3)
when trying to run taichi_ngp.py on Windows using Vulkan.To Reproduce Tried running the script taichi_ngp.py on Windows 11 with python 3.10.9 and taichi 1.5.0. CPU: Ryzen 9 7900X GPU: AMD 7900XT
Log/Screenshots
Additional comments