Open RonanKMcGovern opened 6 months ago
We'd love to have this. Our first priority is quantization but when we have the bandwidth we can look into adding Flash attention. (Note PRs are welcome)
We'd love to have this. Our first priority is quantization but when we have the bandwidth we can look into adding Flash attention. (Note PRs are welcome)
I'd messaged the maintainer of this project a few days ago because it seemed like he's dedicated to it and I saw he wanted to implement it in 1 or 2 other projects. But in case you can't get ahold of 'em, you have the link ¯_(ツ)_/¯
This would be amazing! So that we can have integration in the amazing axolotl!
Are there plans to add flash attention and also flash decoding to allow for improved performance for long context?