Closed LifeIsStrange closed 3 years ago
Hi @LifeIsStrange - thanks for the feedback.
1 - Have already done some unpublished testing and Ranger21 outperforms Ranger. Thus recommend upgrading to Ranger21 if Ranger worked well already for a given dataset.
2 - Adabelief - have plans to test out the adabelief mechanism (adapting step size based on agreement with expected gradient) inside Ranger21 as an additional enhancement.
3 - SWA isn't an optimizer per se, it can be done with any optimizer. But, would be interesting to test that in addition to running with Ranger21 and see if it yields another way to improve training results.
4 - adas - looks interesting. Will work on testing it.
Thanks again for the ideas here. I'm going to move it over to discussion forum to track progress on it.
It's ported over to discussion now via this topic: https://github.com/lessw2020/Ranger21/discussions/7
Thanks again for the ideas! I'm closing this on the issue side and will track in discussions.
Those are apparently the most promising optimizers, would be very useful to see how it compare to RAdam/madgrad!
Adabelief https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer/issues/44
Stochastic weight averaging https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/
Adas https://paperswithcode.com/paper/adas-adaptive-scheduling-of-stochastic