turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

Difference between gemm_half_q_half_gptq_kernel and gemm_half_q_half_kernel #202

Closed frankxyy closed 7 months ago

frankxyy commented 7 months ago

It seems both are q gemm... What is the difference?