flat+ cosine anneal training curve

lessw2020 / Ranger-Deep-Learning-Optimizer

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Apache License 2.0

1.19k stars 176 forks source link

flat+ cosine anneal training curve #23

Closed shengyuan-tang closed 4 years ago

shengyuan-tang commented 4 years ago

i want to do a test of your ranger,i only know cosine anneal training, can you tell me the meaning of flat?thanks

ioanvl commented 4 years ago

It just means your initial learning rate without changes. Flat+Anneal simply means start with a constant lr and only start annealing -after- a number of epochs (around 70-72% of total usually)

There's some schedulers to do that on GitHub (don't have a link now, i'll add it later if i find it again) OR, count the epochs somehow and start using your annealing scheduler only after a set amount of epochs has passed.

shengyuan-tang commented 4 years ago

ok,i get it,thank you

austinmw commented 4 years ago

Do you have a paper link for flat cosine annealing or comparison to cyclical annealing or one-cycle policy?

lessw2020 commented 4 years ago

Hi @austinmw, We didn't write a paper on it. It was just an invention by @grankin (on the fastai forums) b/c we noticed that running Ranger flat for a while was more effective than bouncing it up and down ala cyclical annealing.
Hope that helps!

wangg12 commented 3 years ago

Not sure if it is related, I have written a scheduler (https://github.com/wangg12/flat_anneal_scheduler.pytorch) for flat and cosine schedule (and many other common schedules) in just a function.