unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.37k stars 1.28k forks source link

About alpha/rank in lora #1304

Open Vital1162 opened 4 days ago

Vital1162 commented 4 days ago

How does $\alpha$ in Lora affect performance in training? I usually see everyone set to $2r$. But why? About the rank, I always set it to 128-256 if the dataset quantity is good.

Erland366 commented 3 days ago

I think because in LoRA, you use alpha for the learning rate of the LoRA, which is defined by

$$ LR_{LoRA} = \frac{\alpha}{\sqrt{r}} \times LR $$

But in finetuning, you might want to aggresively update the adapter since your data is usually fewer than pretrain.

Probably my intuition is as long as the result of $\frac{\alpha}{\sqrt{r}}$ is more than one then you good to go

Vital1162 commented 2 days ago

thank you for your response @Erland366, but does dataset size affect these parameters?

Erland366 commented 2 days ago

I've heard in the Discord that if you have smaller dataset, then use smaller rank and alpha. But I haven't tested this a lot myself