johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

Consider using new QLoRA #107

Open juanps90 opened 1 year ago

juanps90 commented 1 year ago

Consider implementing the new 4 bit bitsandbytes from Tim Dettmers from this PR:

https://github.com/huggingface/transformers/pull/23479

johnsmith0031 commented 1 year ago

Thanks for the information

kuleshov commented 1 year ago

Hmmm, I think what Tim Dettmers calls QLoRA is the algorithm already implemented in this repo (except the implementation is a bit different, and they don't use simple rounding instead of GPTQ to quantize the base model).

Please correct me if I'm missing something.

haydenhong commented 1 year ago

So far, my LoRA finetuning on various Llama model sizes using QLoRA and this repo shows this repo is much faster hands-down, close to 10x; with this repo being adapted to targeting full Linear Layers as QLorA does and QLoRA adapted to using xformer. Mine setup is 8x T4 cards. But of course, Tim Dettmers is said to be working on speeding up the forward pass so things can change.