Differences between QLoRA and this repo

qwopqwop200 commented 1 year ago

Nomal float + Double quantization QLoRA currently uses zero shot quantization which is different from GPTQ. However, unlike GPTQ, it does not require data, but incurs some performance loss. Therefore, I think the advantage of using GPTQ to train better LoRA is sufficient.
Paged Optimizers Paged Optimizers uses NVIDIA unified memory to avoid the gradient checkpointing memory spikes that occur when processing a mini-batch with a long sequence length.
Apply LoRA to all linear layers Currently, this repo only applies LoRA to k,v. In case of QLoRA, LoRA is applied to all layers.. This is very important for performance.
hyperparameter The hyperparameters mentioned in the paper are: "We set LoRA r = 64, α = 16, and add LoRA modules on all linear layers of the base model. We also use Adam beta2 of 0.999, max grad norm of 0.3 and LoRA dropout of 0.1 for models up to 13B and 0.05 for 33B and 65B models."

Additionally, 3bit LoRA may be possible. According to the paper "Since finetuning after quantization seems to recover most of the information that is lost during quantization this might enable much more aggressive quantization. For example, 3-bit GPTQ quantization of the basemodel with LoRA might also yield 16-bit full finetuning performance after finetuning"

johnsmith0031 commented 1 year ago

Great! Maybe we can use larger model at the same performance level of fp16 in the future. Also, we can add more modules to finetune in lora training using peft by adjusting the config:

lora_config = LoraConfig(
    r=ft_config.lora_r,
    lora_alpha=ft_config.lora_alpha,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=ft_config.lora_dropout,
    bias="none",
    task_type="CAUSAL_LM",
)

Ph0rk0z commented 1 year ago

Qlora perf is terrible. At least 1/2 or 1/3 of the speed of this.

gptzerozero commented 1 year ago

What is the advantage of using Qlora?

I have been using Qlora for finetuning 13b and 30b and wonder if alpaca_lora_4bit can allow me to use a larger context and/or finetune faster.

johnsmith0031 / alpaca_lora_4bit

Differences between QLoRA and this repo #113