kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
745 stars 39 forks source link

q5_k_m is not supported? #16

Closed keyonzeng closed 3 months ago

keyonzeng commented 4 months ago

I have checked custom_gguf.py and find that q5_k_m is not supported because function dequantize_q5_k_gpu does nothing.

image

so what is the problem with supporting q5_k_m model?

Azure-Tang commented 4 months ago

Yes, we have supported all cpu dequant, but q5km's gpu dequant is on the way, only Q4km and Q8_0 gpu dequant are supported now. If you really need q5k I can open cpu dequant for you, but it will be 10x slower when you loading your model.

BITcyman commented 3 months ago

So sorry to let you wait so long. We have supported q5_k dequant on gpu now. Other dequant types (q2_k and q3_k) on gpu have been supported as well. You can find more details in ktransformers/utils/custom_gguf.py and ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.cu.