Closed ikawrakow closed 2 months ago
Only when natively supported (e.g., Zen4), else left to ggml to handle.
ggml
For LLaMA-3.1-8B we get PP512 = 205 t/s vs 74 t/s in llama.cpp on my Ryzen-7950X CPU.
PP512 = 205
74 t/s
llama.cpp
I get 204 t/s with llamafile, so I guess Justine Tunney has not contributed the more recent tinyBLAS improvements to llama.cpp.
204
tinyBLAS
Only when natively supported (e.g., Zen4), else left to
ggml
to handle.For LLaMA-3.1-8B we get
PP512 = 205
t/s vs74 t/s
inllama.cpp
on my Ryzen-7950X CPU.I get
204
t/s with llamafile, so I guess Justine Tunney has not contributed the more recenttinyBLAS
improvements tollama.cpp
.