ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance
MIT License
89 stars 6 forks source link

Fix Zen4 Flash Attention #35

Closed ikawrakow closed 2 months ago

ikawrakow commented 2 months ago

Closes #34

Funny enough, the bug was not in the FA implementation but in the way I was calling iqk_flash_attn_noalibi from ggml.