ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance
MIT License
89 stars 6 forks source link

iqk_mul_mat(ARM_NEON): adding bf16 support #41

Closed ikawrakow closed 1 month ago

ikawrakow commented 2 months ago

It looks like ArmV8 ISA has support for bf16, but my M2 Max does not have it, so resorting to bf16 -> f32 conversion and computations in f32. This is 2X slower than f16, but 8X better compared to what I get if I try to run a bf16 model on the M2 (NEON and Metal).