microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.62k stars 2.92k forks source link

topk assumes GridDim::maxThreadsPerBlock >= 256 #22079

Open kurquhar opened 1 month ago

kurquhar commented 1 month ago

Describe the issue

This line assumes that there are at least 256 thread per thread block: if (tid < 256) H[tid] = 0; https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/math/topk_impl.cuh#L275

This may be true today, but may not be in the future.

Something like this would be more future proof: for (int x_i = tid; x_i < 256; x_i += blockDim.x) { H[x_i] = 0; }

To reproduce

code inspection

Urgency

Not urgent. In fact, this bug may never surface; depends on nvidia hw architecture changes going fwd.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

15cb2f5a8aa1bdbbb71de9560dc8835a943264d5

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

yuslepukhin commented 1 month ago

Contributions are welcome

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.