konstmish / prodigy

The Prodigy optimizer and its variants for training neural networks.
MIT License
298 stars 17 forks source link

Question #4

Closed Dentoty closed 1 year ago

Dentoty commented 1 year ago

Hello I'm trying to use Prodigy during training an image lora.

Is it normal for the learning rate to always either be 1 or the value I have inputted in the lr_scheduler? Could It be that something along the pipeline is not displaying the correct lr but just the string "2e-6"?

For example the lr for me with the following setup is either 1 or 2e-6, nothing inbetween.

   optimizer = Prodigy(params_to_optimize)

    lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
      optimizer=optimizer, 
      T_max=num_train_epochs,
      eta_min=1e-4
      )
konstmish commented 1 year ago

Hi, sorry for the late reply! The reported behavior sounds strange to me, I don't see why this happened, the values of lr from cosine annealing should be decreasing from 1 to 1e-4 almost continously. Since you're also using eta_min=1e-4, the value of lr shouldn't be below 1e-4, unless you meant d * lr.

I made a colab with an example, you can check the produced values of d * lr at the bottom (the last figure): https://colab.research.google.com/drive/1TrhEfI3stJ-yNp7_ZxUAtfWjj-Qe_Hym?usp=sharing

Dentoty commented 1 year ago

Replacing my config with your code works. The thing that actually makes a difference is T_max=n_epoch where if the n_epoch value is 1 like it was in my config the LR just oscillates between 1 and 1e-4 for some reason. With n_epoch=20 It works as you have described.

konstmish commented 1 year ago

I see, yes, that explains it. Closing the issue as solved now.