turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 274 forks source link

Marlin from the GPTQ guys #295

Closed ghost closed 8 months ago

ghost commented 8 months ago

Hi TurboDerp,

Could Marlin be used in ExLlamaV2?

Best wishes, Jeduh

turboderp commented 8 months ago

It's a substantially different kernel and it's very tailored for 4-bit weights. I haven't studied the kernel in too much detail, though, so I don't know which of the optimizations could be carried over.

ghost commented 8 months ago

Thank you!