issues
search
ikawrakow
/
ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
MIT License
89
stars
6
forks
source link
iq2_k: slightly better bpw - accuracy compromise
#20
Closed
ikawrakow
closed
2 months ago
ikawrakow
commented
2 months ago
For LLaMA-3.1 models:
It is better to quantize all of attn_v with iq3_k instead of half of attn_v with iq4_k
Quantizing attn_output with iq3_k results in a larger PPL decrease compared to what one expects from the added bpw.
For LLaMA-3.1 models: