migraphx-benchmark / AMDMIGraphX

AMD's graph optimization engine.
https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
MIT License
0 stars 1 forks source link

com.microsoft.MultiHeadAttention is unsupported #196

Open music-dino opened 1 month ago

music-dino commented 1 month ago

https://github.com/ROCm/AMDMIGraphX/pull/3425

marko-fabo-htec commented 3 weeks ago

The computation details of the Multi-Head Attention can be found in this paper: https://arxiv.org/abs/1706.03762

marko-fabo-htec commented 2 weeks ago

An example about how to implement the behavior of the MultiHeadAttention operator: https://github.com/microsoft/onnxruntime/issues/19924

Useful articles about Transformers and Attention: https://towardsdatascience.com/transformers-explained-visually-part-1-overview-of-functionality-95a6dd460452

marko-fabo-htec commented 1 week ago

Input description: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/contrib_ops/cpu/bert/multihead_attention_helper.h#L267