AVX2 Flash Attention 2 - Githubissues

ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance

MIT License

89 stars 6 forks source link

Closed ikawrakow closed 1 month ago

ikawrakow commented 1 month ago

This PR adds the ability to use Q4_0, Q4_1 and Q8_0 for the kv-cache.