Difficulty in parallelizing MSAs on GPUs

happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)

MIT License

415 stars 77 forks source link

Difficulty in parallelizing MSAs on GPUs #111

Closed Jaswar closed 1 year ago

Jaswar commented 1 year ago

Hi,

In your paper you mention that "It is known that self-attention is not easily parallelizable on GPUs" (Window Size for Local Self-Attention in Ablation Experiments). I tried looking for works/sources that would mention this as an issue as well, but I couldn't find any. Could you maybe explain or provide sources for why self-attention is not easily parallelizable?

happyharrycn commented 1 year ago

The text here was intended to refer to the difficulty of efficient implementation of self-attention on GPUs, which is mostly memory IO bounded. See these papers for some discussion [link1] [link2].