Closed keyonzeng closed 3 months ago
Yes, we have supported all cpu dequant, but q5km's gpu dequant is on the way, only Q4km and Q8_0 gpu dequant are supported now. If you really need q5k I can open cpu dequant for you, but it will be 10x slower when you loading your model.
So sorry to let you wait so long. We have supported q5_k dequant on gpu now. Other dequant types (q2_k and q3_k) on gpu have been supported as well. You can find more details in ktransformers/utils/custom_gguf.py and ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.cu.
I have checked custom_gguf.py and find that q5_k_m is not supported because function dequantize_q5_k_gpu does nothing.
so what is the problem with supporting q5_k_m model?