I completely forgot to change the IQ2_K Metal implementation after changing the IQ2_K block scales in the last PR. This PR fixes it. It also improves the performance of the IQ2_K Metal dot product - TG-128 for LLaMA-3.1-8B goes to 46.2 t/s up from 42.6 t./s.
I completely forgot to change the
IQ2_K
Metal implementation after changing theIQ2_K
block scales in the last PR. This PR fixes it. It also improves the performance of theIQ2_K
Metal dot product - TG-128 for LLaMA-3.1-8B goes to 46.2 t/s up from 42.6 t./s.