lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
320 stars 44 forks source link

Changes in lr #9

Closed zsgj-Xxx closed 3 years ago

zsgj-Xxx commented 3 years ago

I got different learning rate curves in two identical experiments, do you understand the reason? 5e7905b16a665416f68510f23eeb01b 9385ede7097e4b33b5f87c92bbfb600 It looks like the first image is the desired result

zsgj-Xxx commented 3 years ago

In the first picture my device uses only one GPU while the second uses 7. Is this the reason?

lessw2020 commented 3 years ago

Hi @zsgj-Xxx, Thanks for posting this issue! 1 - Yes the reason is from running on single gpu vs multi-gpu.
specifically, Ranger21 will attempt to compute the warmup phase based on a calculation from Beta2. That ends up at roughly 2000 iterations. However, it will then compare that proposed warmup as a percentage of your total iterations and if it exceeds 30% of the total run, then it will recompute based on total iterations. This was to handle where someone might be doing super short runs and thus the algorithmic calc from beta2 makes no sense.

What is likely happening above is that on your single gpu, the beta2 makes sense as all iterations are being done in one place. When farmed out to 7 gpus, then the algorithmic amount exceeds the 30% threshold so it defaults to a percentage of what is being run on each gpu...thus the lr curve ends up different in single vs multi-gpu.

Note that you can control the lr warmup percentage directly if you like via the Ranger21 params as well to over-ride any of the above.

Hope that helps clarify though, and also thanks for showing the lr curves as that make's it much easier to confirm that it is expected behaviour. (longer term still working on an automated lr scheduler which would obviate needing two different calculations in the first place, but not ready yet).

zsgj-Xxx commented 3 years ago

Hi, thank you for your reply. I reset the optimizer parameters bee76fa3b7ccbe8048ed9a3f4020086 After my training, I compared the result with that of Ranger and found an interesting phenomenon this is ranger afc0fe21f9e3c6498c3d643d6d3d9fa this is ranger21 43f22afa74cb84f502c26f73d318eeb The loss of ranger21 fluctuates a little too much. Is it because my parameters are not set correctly?

And I found that the curve of learning rate didn't seem to reach the set value 87cb1b95d52b474469ec0a322157e1e After a long period of preheating, the learning rate did not reach the set value. When the decline began, the learning rate suddenly returned to the set value and began to decline(At about 72, the learning rate suddenly goes up and then goes down)

Your work is very meaningful!!!

zsgj-Xxx commented 3 years ago

I think I found the problem. I observed that self.current after the warm-up stopped, lr is always adding the last step state of preheating, without adding the current latest lr, so the lr curve drawn is not correct, or you should modify it if step >= warmup: if not self.warmup_complete: self.warmup_complete = True self.current_lr = lr return lr

lessw2020 commented 3 years ago

Hi @zsgj-Xxx, Thanks very much for the screenshots and pointing out this issue. I'm reviewing now and testing - it appears I had an incorrect >= instead of just > for if step >= warmup:
Thus, the lr is stopped one iteration short of the full lr b/c the last iteration is = and thus skips it. I've changed to if step > warmup to allow it to process that last iteration which should get it to full lr set point. I've also added a check to always verify this was achieved, and to print out a line when full warmup is processed. I'm testing to make sure it's all working and then will check in.

lessw2020 commented 3 years ago

Hi @zsgj-Xxx, I've checked in a fix. In addition, I've added a new show_schedule() to make it easier to see the lr curves as well as min/max/start values.
Mostly since having this kind of visualization helps to ensure Ranger21 is doing the right thing in all scenarios.
Thanks again for spotting this issue! ranger21_show_schedule

zsgj-Xxx commented 3 years ago

Thanks for the quick reply, I'm not quite sure how to set some of the other parameters yet, looking forward to your demo