Difference between gemm_half_q_half_gptq_kernel and gemm_half_q_half_kernel

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.28k stars 243 forks source link

Closed frankxyy closed 7 months ago

frankxyy commented 7 months ago

It seems both are q gemm... What is the difference?