ROCm Flash-Attention 2 - Githubissues

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.18k stars 233 forks source link

I have been informed that while Flash Attention's there it's not being used - https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-2031180332 The post has a link to what has helped some people, so I'll link that here : https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-1889069311 Essentially they were adjusting version checks to get it to work... I've tried the same change, and can not get it working, so I thought I'd write, and raise the issue here, in hopes that it may help others with the same issue.

AMD's version of Flash Attention 2 is 2.0.4 - have any insights about what needs to happen to get it to work?

turboderp / exllamav2