Closed zsgj-Xxx closed 3 years ago
In the first picture my device uses only one GPU while the second uses 7. Is this the reason?
Hi @zsgj-Xxx,
Thanks for posting this issue!
1 - Yes the reason is from running on single gpu vs multi-gpu.
specifically, Ranger21 will attempt to compute the warmup phase based on a calculation from Beta2. That ends up at roughly 2000 iterations.
However, it will then compare that proposed warmup as a percentage of your total iterations and if it exceeds 30% of the total run, then it will recompute based on total iterations.
This was to handle where someone might be doing super short runs and thus the algorithmic calc from beta2 makes no sense.
What is likely happening above is that on your single gpu, the beta2 makes sense as all iterations are being done in one place. When farmed out to 7 gpus, then the algorithmic amount exceeds the 30% threshold so it defaults to a percentage of what is being run on each gpu...thus the lr curve ends up different in single vs multi-gpu.
Note that you can control the lr warmup percentage directly if you like via the Ranger21 params as well to over-ride any of the above.
Hope that helps clarify though, and also thanks for showing the lr curves as that make's it much easier to confirm that it is expected behaviour. (longer term still working on an automated lr scheduler which would obviate needing two different calculations in the first place, but not ready yet).
Hi, thank you for your reply. I reset the optimizer parameters
After my training, I compared the result with that of Ranger and found an interesting phenomenon
this is ranger
this is ranger21
The loss of ranger21 fluctuates a little too much. Is it because my parameters are not set correctly?
And I found that the curve of learning rate didn't seem to reach the set value
After a long period of preheating, the learning rate did not reach the set value. When the decline began, the learning rate suddenly returned to the set value and began to decline(At about 72, the learning rate suddenly goes up and then goes down)
Your work is very meaningful!!!
I think I found the problem. I observed that self.current after the warm-up stopped, lr is always adding the last step state of preheating, without adding the current latest lr, so the lr curve drawn is not correct, or you should modify it
if step >= warmup:
if not self.warmup_complete:
self.warmup_complete = True
self.current_lr = lr
return lr
Hi @zsgj-Xxx,
Thanks very much for the screenshots and pointing out this issue.
I'm reviewing now and testing - it appears I had an incorrect >= instead of just > for if step >= warmup:
Thus, the lr is stopped one iteration short of the full lr b/c the last iteration is = and thus skips it.
I've changed to if step > warmup to allow it to process that last iteration which should get it to full lr set point.
I've also added a check to always verify this was achieved, and to print out a line when full warmup is processed.
I'm testing to make sure it's all working and then will check in.
Hi @zsgj-Xxx,
I've checked in a fix.
In addition, I've added a new show_schedule() to make it easier to see the lr curves as well as min/max/start values.
Mostly since having this kind of visualization helps to ensure Ranger21 is doing the right thing in all scenarios.
Thanks again for spotting this issue!
Thanks for the quick reply, I'm not quite sure how to set some of the other parameters yet, looking forward to your demo
I got different learning rate curves in two identical experiments, do you understand the reason?
It looks like the first image is the desired result