Closed bitnom closed 1 year ago
No yet. But QLoRA, GPTQ, and 4-bit quantization are on the todo list.
@peakji GPTQ would be fantastic. The 4 bit implementation in bitsandbytes has very slow inference speeds (like 8X slower).
For GPTQ integration, AutoGPTQ is ideal since it provides a higher level abstraction than the low-level and always changing gptq-for-llama repo.
No yet. But QLoRA, GPTQ, and 4-bit quantization are on the todo list.
It'd be interesting if this could support multiple LoRa adapters[0] that could be swapped using the unused model parameter.
4-bit quantization with QLoRA is added in https://github.com/hyperonym/basaran/pull/209.
Feel free to open another issue for GPTQ integration.
does it qlora?