Open alceballosa opened 5 months ago
Hi Alberto,
There's no specify reason to use the 8-bit optimizer from bnb, we didn't run tests to compare it with the vanilla optimizer from PyTorch.
My guess is when run fine-tuning over very larger models like 70B or bigger ones, the 8-bit optimizer could potentially save more GPU RAM, but I'm not sure since I haven't done it due to limited GPU resources.
Hi Michael!
Thanks for the reply. In the end I just used normal AdamW, didn't see much difference at the scale I'm training at.
Best,
Alberto
Hi!
Thanks for making this happen, it's a super useful resource!
I was wondering whether there is any reason to use bnb's 8-bit optimizers when doing Q-Lora optimization. Is it better to just use vanilla optims from PyTorch?
Best,
Alberto