Using 8-bit optims - Githubissues

michaelnny / QLoRA-LLM

A simple custom QLoRA implementation for fine-tuning a language model (LLM) with basic tools such as PyTorch and Bitsandbytes, completely decoupled from Hugging Face.

MIT License

3 stars 1 forks source link

Using 8-bit optims #1

Open alceballosa opened 5 months ago

alceballosa commented 5 months ago

Hi!

Thanks for making this happen, it's a super useful resource!

I was wondering whether there is any reason to use bnb's 8-bit optimizers when doing Q-Lora optimization. Is it better to just use vanilla optims from PyTorch?

Best,

Alberto

michaelnny commented 5 months ago

Hi Alberto,

There's no specify reason to use the 8-bit optimizer from bnb, we didn't run tests to compare it with the vanilla optimizer from PyTorch.

My guess is when run fine-tuning over very larger models like 70B or bigger ones, the 8-bit optimizer could potentially save more GPU RAM, but I'm not sure since I haven't done it due to limited GPU resources.

alceballosa commented 5 months ago

Hi Michael!

Thanks for the reply. In the end I just used normal AdamW, didn't see much difference at the scale I'm training at.

Best,

Alberto