ikawrakow / ik_llama.cpp

llama.cpp clone with additional SOTA quants and improved CPU performance
MIT License
57 stars 4 forks source link

iq4_k: speedup quantization by a factor of ~2 #10

Closed ikawrakow closed 1 month ago

ikawrakow commented 1 month ago

It is interesting to observe that clang produces code that is ~6X faster than the GCC result on a simple benchmark that measures the speed of the best_index_iq4n function (which is the bottleneck during IQ4_K quantization). But when this is used in practice in quantize_row_iq4_k_impl_bs16, the clang executable is actually worse than the GCC executable. Either way, both compilers need a hand, so this PR gives it to them. This gives us a ~2X speedup in the IQ4_K quantization.