Open neuronflow opened 3 years ago
Well, have you tried as shown below,
from ranger21 import Ranger21
optimizer = Ranger21(model.parameters(), lr = 1e-02, num_epochs = epochs, num_batches_per_epoch = len(train_loader))
Hi @neuronflow, @saruarlive is correct - the issue is we need to know how many epochs and how many iterations per epoch in order to auto-compute the lr schedule. Clearly our error handling should be improved to make it clear the issue (I thought we were checking for this case) but from the error listed above, it's basically saying the num_epochs = None, num_batches_per_epoch=None, and it can't do any math with it. I'll leave this open until I verify and add some better error handling, but the core issue is you need to pass in the total epochs and num_iterations (and we need to document this better).
thank you, with the above the training seems to start until it's crashing with this error:
File "/mnt/Drive3/florian/multi_patch_blob_loss/neuronflow/training/epoch/trainEpoch.py", line 77, in train_epoch
optimizer.step()
File "/home/florian/miniconda3/envs/msblob/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 570, in step
self.agc(p)
File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 398, in agc
p_norm = self.unit_norm(p).clamp_(self.agc_eps)
File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 382, in unit_norm
raise ValueError(
ValueError: unit_norm:: adaptive gclipping: unable to process len of 5 - currently must be <= 4
If I understand it correctly ranger21 contains a lr scheduler, so it does not sense to combine it with cosine annealing and warm restarts?
Hi @neuronflow, The valueerror above comes from having 4 or more dimensions ala 3D convolutions. If you pull the latest version that I posted last week then it adaptive clipping will handle any size dimensions so that is resolved.
To your other point - by default Ranger21 will handle the lr scheduling internally for you, so you would not want to use with cosine annealing or other lr scheduling. You can of course turn off internal lr scheduling if you want to compare using Ranger21 internal scheduling vs your own scheduler...I wouldn't recommend it since there's a lot of validation behind the schedule Ranger21 sets, but certainly you can test it out to see. You can turn off scheduling by removing the warmup: use_warmup=False, and the warmdown: warmdown_active=False,
I can see that it might be simpler if had a single use_lr_scheduling = True/False so I think I'll add that soon...but for now, turning warmup and warmdown off will have r21 operate as an optimizer with no scheduling, and then you can drive the lr with your own schedule. Hope that helps!
thank you once again, for the fast and detailed response with the latest update it seems to work! :)
One further question, I have a training where I use multiple training data loaders with different batch length..is it possible to apply ranger21 in this context?
I get the following error when starting my training:
initializing ranger with: