ikawrakow / ik_llama.cpp

llama.cpp clone with additional SOTA quants and improved CPU performance
MIT License
57 stars 4 forks source link

IQ1_TN Metal implementation #46

Closed ikawrakow closed 1 week ago

ikawrakow commented 1 week ago

IQ1_BN stores a scale at the beginning of each row, followed by IQ1_BN packing of the ternary quants. The existing Metal implementation does not allow for that sort of thing, so some changes were necessary (apart from adding the necessary additions in ggml-metal.m):

With this, the IQ1_TN implementation is complete for all supported platforms (Zen4, AVX2, ARM_NEON, CUDA, Metal).