Closed kallewoof closed 2 months ago
This looks good. Have you tested it yet? One thing to check is that the lora+ optimizer parameters get set correctly. The create_loraplus_optimizer() takes the model instead of the parameters, presumably internally it is still only grabbing the parameters that requires_grad? Need to make sure of that.
The create_loraplus_optimizer() takes the model instead of the parameters, presumably internally it is still only grabbing the parameters that requires_grad? Need to make sure of that.
It does that, yes
and it also does bnb module override registration stuff if it is able to (edited to fix link)
Have you tested it yet?
I have used this exclusively since I made the pull request, a couple of fine tunes so far, and working on a 70B fine tune right now. The results look fine to me so far, but I will know more after spending some time with it.
Training broke down when I tried using this at the recommended ratio 16. Testing other (lower) ratio values to see if they work better. At ratio 5, the training seemed to work, but with lora+ disabled, the training seemed to give slightly better eval loss during the initial stages of training. Will experiment with this some more and do full training runs comparing eval with lora+ disabled and with different ratios.
Closing this. Will look into it again when the next PEFT has been released, if I find the time.
Some issues that need addressing, so keeping as draft.
None
them, or at leastNone
them when LoRA+ is activated?