Implement RAdam optimizer

seanpmorgan commented 5 years ago

System information

Are you willing to contribute it (yes/no): Yes, but would like to see if someone else will
Are you willing to maintain it going forward? (yes/no): Yes, but would like to see if someone else will

Describe the feature and the current behavior/state. New paper describing RAdam looks like a drop in replacement for Adam optimizer with better results.

https://arxiv.org/abs/1908.03265v1 https://github.com/LiyuanLucasLiu/RAdam

yassineAlouini commented 5 years ago

That would be awesome indeed. For the record, here is a Keras implementation (not official): https://github.com/CyberZHG/keras-radam.

rafiqhasan commented 5 years ago

This is going to be a great addon ! For any one who is looking out for another detailed comparison can read through - https://medium.com/@lessw/new-state-of-the-art-ai-optimizer-rectified-adam-radam-5d854730807b

talipini commented 5 years ago

I vote for this as well. Looking for this!

SSaishruthi commented 5 years ago

Great addition. Can I try implementation of this? @seanpmorgan

seanpmorgan commented 5 years ago

@SSaishruthi Sure! I know @sayoojbk has also shared interest in helping with this so if you could make a WIP PR as soon as you get it started that'd be great so we could have a few eyes on this and push it through.

physicso commented 5 years ago

Looking forward to that!

WindQAQ commented 5 years ago

Seems that there is an unofficial implementation for TF/Keras. https://github.com/CyberZHG/keras-radam

physicso commented 5 years ago

I have found one for TF: https://github.com/taki0112/RAdam-Tensorflow

SSaishruthi commented 5 years ago

Thanks for the links. I have marked all the links and planning to kick start with the implementation after this weekend. Just done with other priority change. Will keep posted.

luminoso commented 5 years ago

@SSaishruthi it looks like RAdam has already an improvement called 'RAdam+lookahead'. One possible implementation of lookahead: https://github.com/bojone/keras_lookahead (been testing this one myself)

It's called 'Ranger' (the combination of RAdam + lookahead): Small article about it: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d

Paper: https://arxiv.org/abs/1907.08610v1 Lookahead Pytorch implementation: https://github.com/lonePatient/lookahead_pytorch/blob/master/optimizer.py

makercob commented 5 years ago

@luminoso however, it seems that that repo does not support TensorFlow.

seanpmorgan commented 5 years ago

Thanks for the links. I have marked all the links and planning to kick start with the implementation after this weekend. Just done with other priority change. Will keep posted.

Hi @SSaishruthi I know you're working on several different things, including the core migration of F1. Would you be okay with @AakashKumarNain taking a look at this one as he has expressed interest. Would love for you to help review any implementation.

SSaishruthi commented 5 years ago

@seanpmorgan Sure. Will collaborate along so that we keep things going.

AakashKumarNain commented 5 years ago

@seanpmorgan @SSaishruthi. The keras implementation as pointed out in the comments LGTM. Also, it is written with Optimizer_v2 api. Take a look

https://github.com/CyberZHG/keras-radam/blob/master/keras_radam/optimizer_v2.py

seanpmorgan commented 5 years ago

Ping @CyberZHG. Would it be okay to use your implementation as part of Addons? The license you have on it looks like it'd be okay -- but wanted to get your permission/see if you'd like to contribute it yourself?

AakashKumarNain commented 5 years ago

Yeah it is fair if @CyberZHG just adds it here. Most of the work is already done in that.

CyberZHG commented 5 years ago

I'll try to migrate the codes and make a PR in the next few days.

Alessiobrini commented 4 years ago

Is available an official implementation now?

tridemax commented 4 years ago

Is available an official implementation now?

@Alessiobrini https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/RectifiedAdam

Alessiobrini commented 4 years ago

Is available an official implementation now?

@Alessiobrini https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/RectifiedAdam

Thank you! Anyway, if I set warmup proportion and total steps, the optimizer doesn't seem to do the proper learning rate warmup. I don't know if I'm doing something wrong, but the doc says that is only sufficient to specify parameters to have the warmup schedule.

leandro-gracia-gil commented 4 years ago

One question. I've been looking at how weight decay is implemented here, in the original RAdam paper and in the paper originally introducing weight decay for Adam (also known as AdamW and cited by the RAdam paper).

The implementations seem to be pretty much the same, but there's a subtle difference: in AdamW the weight decay can also be scheduled for warm restarts, and in fact they propose a normalized value for those cases. Looking at the TF addons AdamW implementation you can see that both learning rate and weight decay argument support a callable: that is, a LearningRateSchedule. However, the RectifiedAdam implementation only does so for the learning rate.

Looking at the code it looks like this should be relatively trivial to change: instead of using the weight decay as a constant hyperparameter, it would need to be handled in a similar way to how decayed_lr works here.

Was there any reason to not also support weight decay schedulers in this case? I imagine you can get the same result by passing weight decay as a tensor with the scheduling already applied, but by doing so the scheduler state might be lost when serializing and deserializing back.

facaiy commented 4 years ago

@leandro-gracia-gil Thanks for your good question, Leandro. Would you mind filing an issue for it?

leandro-gracia-gil commented 4 years ago

@facaiy Sure. Here it is: https://github.com/tensorflow/addons/issues/1908

tensorflow / addons

Implement RAdam optimizer #422