Closed zwyao closed 1 month ago
i find the bug dose not be fixed in the latest version 1.19.1
@zwyao, The thread-safe for self attention FusedMHARunnerFP16v2 was fixed in https://github.com/microsoft/onnxruntime/pull/21420. There was another fix for cross-attention. The bug was resolved in 1.19.0 release. Please try 1.19.2.
@zwyao, The thread-safe for self attention FusedMHARunnerFP16v2 was fixed in #21420. There was another fix for cross-attention. The bug was resolved in 1.19.0 release. Please try 1.19.2.
emmm, thanks
Describe the issue
in my bert model,when i use head-size == 32,the attention cuda kernel will cause ort codedump,the error msg says “cuda illegal memory access was encountered”. i find the reason is the FusedMHARunnerFP16v2 dose not support concurrent running.
To reproduce
attention_bug_fix.txt
this is my fix code
Urgency
No response
Platform
Linux
OS Version
1.18.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.0 master
ONNX Runtime API
C++
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response