Open kurquhar opened 1 month ago
Contributions are welcome
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
This line assumes that there are at least 256 thread per thread block: if (tid < 256) H[tid] = 0; https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/math/topk_impl.cuh#L275
This may be true today, but may not be in the future.
Something like this would be more future proof: for (int x_i = tid; x_i < 256; x_i += blockDim.x) { H[x_i] = 0; }
To reproduce
code inspection
Urgency
Not urgent. In fact, this bug may never surface; depends on nvidia hw architecture changes going fwd.
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
15cb2f5a8aa1bdbbb71de9560dc8835a943264d5
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response