Fix and optimize iq2k Metal implementation

ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance

MIT License

89 stars 6 forks source link

Fix and optimize iq2k Metal implementation #86

Closed ikawrakow closed 1 month ago

ikawrakow commented 1 month ago

I completely forgot to change the IQ2_K Metal implementation after changing the IQ2_K block scales in the last PR. This PR fixes it. It also improves the performance of the IQ2_K Metal dot product - TG-128 for LLaMA-3.1-8B goes to 46.2 t/s up from 42.6 t./s.