ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance
MIT License
89 stars 6 forks source link

Add support for bf16 to iqk_mul_mat #39

Closed ikawrakow closed 2 months ago

ikawrakow commented 2 months ago

Only when natively supported (e.g., Zen4), else left to ggml to handle.

For LLaMA-3.1-8B we get PP512 = 205 t/s vs 74 t/s in llama.cpp on my Ryzen-7950X CPU.

I get 204 t/s with llamafile, so I guess Justine Tunney has not contributed the more recent tinyBLAS improvements to llama.cpp.