Closed sean-adler closed 5 years ago
@sean-adler there was a discussion in pytorch issues on this and there is CosineAnnealing built-in in pytorch. However, maybe it could make sense to modify LR and momentum...
Speaking about techniques like SGDR, maybe it could be generalized to something like LR scheduler with restarts, like I implemented here.
HTH
Speaking about techniques like SGDR, maybe it could be generalized to something like LR scheduler with restarts, like I implemented here.
@vfdev-5 Nice! Yeah, that's pretty much exactly what I had in mind, except with batch-level adjustment too.
@sean-adler could you explain for me please why is it necessary to control at batch-level adjustment ?
@sean-adler could you explain for me please why is it necessary to control at batch-level adjustment ?
If I'm understanding the SGDR paper correctly, that's how they implement their LR schedule - the warm restarts occur at after some number of epochs, but the learning rate is updated after every batch.
Here's one section where they mention it in the paper:
@sean-adler thanks! Personally, I think a more useful lr schedule could be something like here:
"warmUp": "the learning rate starts at 0.1, then it increases linearly per batch for the first 4 epochs",
"schedule": [
{"learning_rate": 0.1, "epochs": 1},
{"learning_rate": 0.2, "epochs": 1},
{"learning_rate": 0.3, "epochs": 1},
{"learning_rate": 0.4, "epochs": 17},
{"learning_rate": 0.04, "epochs": 14},
{"learning_rate": 0.004, "epochs": 8},
{"learning_rate": 0.0004, "epochs": 3}
This possibility to set lr value and keep during a period...
I think a schedule like that could be implemented using the same base handler I was testing out, and the SGDR cosine annealing schedule (and other schedules like CLR) could be implemented using the same thing too. It seems useful to have some base functionality like this, but let me know what you think.
@vfdev-5 I opened a work-in-progress PR to try and make what I'm talking about more concrete (in terms of base handlers and other stuff like that). Hopefully it helps the discussion here!
@sean-adler yes, just saw it. Code looks clean and shows the power of event handlers :) but I think it goes in parallel with torch built-in _LRScheduler
and I wonder what could we gain more with this proposed approach. Today I use pytorch built-in schedulers and a handler that just call step()
on them it is OK...
Could you, please, also compare your cosine annealing schedule and pytorch built-in one ?
Today I use pytorch built-in schedulers and a handler that just call
step()
on them it is OK...
True, that definitely works, but it doesn't let you adjust other parameters like momentum. That's pretty easy to do with these handlers though, since you can pass a parameter name e.g. 'momentum'
as an argument.
Could you, please, also compare your cosine annealing schedule and pytorch built-in one ?
Sure! The version in that PR implements warm restarts, and the pytorch built-in one does not. Other than that, the schedule is the same.
Okay, I see, thanks. Just for my curiosity, how it is useful to change momentum
and other parameters during training ?
What do you call warm restarts ?
Just for my curiosity, how it is useful to change
momentum
and other parameters during training ?
I've only seen it done here. I have no idea if it will become more popular over time, but the results are interesting 😄
What do you call warm restarts ?
I think that's just how the SGDR authors describe the cyclicality of their LR schedule. In their paper, a "warm restart" is when the LR is reset to its initial value after decreasing for some number of epochs.
Just found this this article cited by you and wanted to read it :)
In their paper, a "warm restart"
Before asking you I tried to use ctrl+f in the article on the word "warm" in the article and it gave 0 occurences :)
Before asking you I tried to use ctrl+f in the article on the word "warm" in the article and it gave 0 occurences :)
Oh, sorry for the miscommunication, I meant that warm restarts come up in the SGDR paper.
So as the majority of this functionality can be handled by the pytorch lr_scheduler, I don't think there's a need for a specific handler in ignite. Unless I'm missing something?
In general, we want to keep the universal handlers in ignite and make anything niche easy to implement for the user
So as the majority of this functionality can be handled by the pytorch lr_scheduler, I don't think there's a need for a specific handler in ignite. Unless I'm missing something?
My understanding is that the lr_scheduler
module doesn't handle cycles or momentum. Anyone can implement cycles by subclassing _LRScheduler
, but it'd require a fair amount of what's in #171 or something like it. And to my knowledge, scheduling momentum is not possible with _LRScheduler
.
If this stuff isn't a common enough need, happy to close this out!
It was evoked here to introduce ignite.contrib
module, so maybe we could put this code in ignite.contrib.handlers
.
@sean-adler is it difficult to add a supplementary option to existing classes to achieve "warm-up" effect (for several first epochs optimize only some top-layers and then finetune all the others) like this:
OPTIM = Adam(
params=[
{"params": MODEL.features.parameters(), 'lr': 0.001},
{"params": MODEL.classifier.parameters(), 'lr': 0.001},
{"params": MODEL.final_classifiers.parameters(), 'lr': 0.001},
],
)
def lambda_lr_features(epoch):
if epoch < 5:
return 0.001
else:
return 0.1 * (0.75 ** (epoch - 3))
def lambda_lr_classifier(epoch):
if epoch < 5:
return 0.01
else:
return 0.75 ** (epoch - 3)
def lambda_lr_final_classifiers(epoch):
if epoch < 5:
return 1.0
else:
return 0.88 ** (epoch - 3)
LR_SCHEDULERS = [
LambdaLR(OPTIM, lr_lambda=[lambda_lr_features, lambda_lr_classifier, lambda_lr_final_classifiers])
]
@sean-adler is it difficult to add a supplementary option to existing classes to achieve "warm-up" effect (for several first epochs optimize only some top-layers and then finetune all the others) like this:
@vfdev-5 specifying phases like that is a good idea, I'll try adding it!
@sean-adler take a look at this article (maybe you have seen it) and
Remark 2: Apply momentum correction after changing learning rate if using (10)
It seems, one more âž• for moment variation during the training
@sean-adler take a look at this article (maybe you have seen it)
I hadn't seen it! Very cool, thanks for posting it.
My understanding is that the lr_scheduler module doesn't handle cycles or momentum. Anyone can implement cycles by subclassing _LRScheduler, but it'd require a fair amount of what's in #171 or something like it. And to my knowledge, scheduling momentum is not possible with _LRScheduler.
@sean-adler sorry for the delay on this. Do we think this might be a better change for the LRScheduler in Pytorch itself?
@sean-adler sorry for the delay on this. Do we think this might be a better change for the LRScheduler in Pytorch itself?
No problem! I've been super busy as well for the past couple of weeks. I'm not opposed to opening a PR there, it's up to you whether it would be a better match in that repo or in ignite
. My perspective is that it's a lot easier to set up these schedules using the event/callback API in ignite
, but maybe there is value in moving it to core.
@sean-adler I would interested as well by implementation of this schedulers : HillLearningRate and CompositeLearningRate
@alykhantejani what do you think about to introduce a contrib
or extensions
module with stuff like schedulers, hyperopt, maybe training config files, stuff that is not essential but could be helpful ?
I think contrib
makes sense, would be nice to have community addition (maintained by the community). But I think we should strive to keep the core ignite lib lean
Interesting link on how fast.ai schedules LR and optimizers and its params : https://github.com/fastai/fastai/blob/master/courses/dl2/training_phase.ipynb
Parameter Scheduling and providing a List of Training Phases the way Fast.AI does would go a long way in making the trainer generally applicable to most problems. Would be great to have it in the core. But contrib
would also work.
@vfdev-5 @sampathweb for sure, fastai's training phase API is a lot more flexible than both PyTorch core's LR scheduling and this PR's approach.
One (maybe subtle) difference it has compared to the current engine API in ignite
is that you end up specifying the number of epochs in the schedule itself, instead of passing that value to Engine.run()
. Probably not super significant, but it's an interesting example of how explicit those types of schedules are.
(FWIW, I think the engine API in ignite
is much more elegant than the callback API in fastai
, but their approach to parameter scheduling is better than anything else I've seen.)
@sean-adler @vfdev-5 concatenating several schedulers together, as done by fastai
can also be implemented using the framework suggested in #171:
class ConcatScheduler(ParamScheduler):
"""Concat a list of Schedulers.
Args:
...
schedulers_list (list): List of (scheduler_cls, scheduler_kwds, duration).
...
Note:
The concatenated schedulers inherit the optimizer and param_name
from the the concatenating scheduler.
"""
def __init__(self,
optimizer,
param_name,
schedulers_list,
save_history=False):
super(ConcatScheduler, self).__init__(optimizer, param_name, save_history=save_history)
self._schedulers_list = schedulers_list
self._schedulers_index = 0
self._next_scheduler_switch = 0
def _next_scheduler(self):
scheduler_cls, scheduler_kwds, self._next_scheduler_switch = \
self._schedulers_list[self._schedulers_index]
kwds = scheduler_kwds.copy()
kwds.update(
dict(
optimizer=self.optimizer,
param_name=self.param_name,
save_history=self.save_history
)
)
self._scheduler = scheduler_cls(**kwds)
self._schedulers_index = (self._schedulers_index + 1) % len(self._schedulers_list)
def __call__(self, engine):
if self._next_scheduler_switch is not None:
self._next_scheduler_switch -= 1
if self._next_scheduler_switch < 0:
self._next_scheduler()
return self._scheduler(engine)
Scheduling a cyclic phase followed by a linear phase can be done like;
scheduler = ConcatScheduler(
optimizer=optimizer,
param_name='lr',
schedulers_list=[
(
CosineScheduler,
dict(
start_value=lr_start,
end_value=lr_end,
cycle_size=lr_cycle
),
phase1_duration
),
(
LinearScheduler,
dict(
start_value=lr_start,
end_value=lr_start*0.01,
cycle_size=lr_cycle*2
),
None
)
],
save_history=True
)
I am using this parameter scheduler and can give a hand in implementing it.
@amitibo thanks for the proposition! For instance we are not yet intergrated ParamScheduler
into contrib
module. IMO it would be great to have ConcatScheduler
in the contrib
module.
@sean-adler could you please send a PR with ParamScheduler
in contrib
?
Tell me please if you are busy at the moment and I can copy it by myself.
@vfdev-5 and @sean-adler, if no body is currently working on this issue, I can prepare a PR.
@amitibo are you talking about ConcatScheduler
? If yes, this would be good. Let me a bit of time to merge ParamScheduler
and your PR will be more than welcomed. Before this you can implement starting from the branch of Sean Adler
@amitibo we merged ParamScheduler
into master, so you can start a PR from the master.
Providing an abstraction to adjust optimizer parameters during training seems like it might be useful - techniques like SGDR seem applicable to many types of models.
The
torch.optim.lr_scheduler
module in PyTorch core implements some useful schedulers, but (a) can only adjust the LR, and (b) only adjusts it per-epoch.On the other hand, the
Engine
event API seems like a really natural way to adjust parameter values, since handlers that manipulate them could be added for eitherITERATION_*
orEPOCH_*
events, and modifying multiple parameters at once (e.g. LR and momentum) would be straightforward too.I wrote a short IPython notebook as a prototype of one way it could look to do this with the event API in a general way (plots are at the very bottom). I left most of the actual scheduler code in separate files for now to try and see if the idea is even worth it first. Would this be useful?