lessw2020 / Ranger21

Ranger deep learning optimizer rewrite to use newest components
Apache License 2.0
321 stars 43 forks source link

decouple the lr scheduler and optimizer? #36

Open hiyyg opened 2 years ago

hiyyg commented 2 years ago

Hi @lessw2020, thanks for the very nice work! I noticed that in this Ranger21, the optimizer is tightly coupled with the lr scheduler, could you guide me how I can decouple them?

neuronflow commented 2 years ago

I would like to second this. A split in ranger optimizer and ranger scheduler would be really cool.

lessw2020 commented 2 years ago

Hi @hiyyg and @neuronflow, Right now you can turn off the built in lr scheduling by turning off both warmup and warmdown: use_warmup=False warmdown_active=False that should simply pass through the input lr and not touch it. Is that what you mean by decouple? Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

neuronflow commented 2 years ago
Or do you mean having the scheduler seperately programmable (i.e. cosine decay vs we use linear etc).

This is what I initially had in mind. Maybe, just maybe Ranger optimizer should go hand in hand with Ranger scheduler following the standard pytorch conventions?

felipemello1 commented 2 years ago

Hi @lessw2020, apparently in this current implementation there is no way to have different parameters learn using different learning rates. Did I get it right?

If this were available, I would love to use it. Two use cases are the following: 1) Fine tuning a network where layers closer to the head have a higher lr; 2) My case: I train a graph neural network, and I need the embeddings to have 100x learning rate of the model, but in this current script I cant use the standard pytorch way of doing it:

model_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == False]
emb_params = [params for name, params in self.model.named_parameters() if name.startswith('emb.') == True]
optimizer_model = madgrad_wd([{'params': emb_params, 'lr': self.model_config['emb_max_lr']},
                          {'params': model_params, 'lr': self.model_config['model_max_lr']}], weight_decay=self.model_config['wd'])
lessw2020 commented 2 years ago

Hi @fmellomascarenhas, @neuronflow and @hiyyg - fully agree with all the points above (decoupled scheduler and parameter groups. This split between scheduler and optimizer will happen for Ranger22 (the 2022 edition lol).
Should have more info and updates shortly, as we just agreed last night to go ahead with the Ranger22 version.