Check out FMHA - Githubissues

learning-at-home / lean_transformer

Memory-efficient transformer. Work in progress.

MIT License

19 stars 3 forks source link

Open justheuristic opened 2 years ago

justheuristic commented 2 years ago

@krunt encountered a fused monolithic implementation of multi-head attention: https://github.com/NVIDIA/apex/tree/master/apex/contrib/csrc/fmha main stuff: ./apex/contrib/fmha/fmha.py / ./apex/contrib/csrc/fmha

Let'c compare how it performs v.s. vanilla pytorch MHA on sequence length 512 & hid size 4096; 64 heads

krunt commented 2 years ago