[Feature Request] Parameter scheduling

sean-adler commented 6 years ago

Providing an abstraction to adjust optimizer parameters during training seems like it might be useful - techniques like SGDR seem applicable to many types of models.

The torch.optim.lr_scheduler module in PyTorch core implements some useful schedulers, but (a) can only adjust the LR, and (b) only adjusts it per-epoch.

On the other hand, the Engine event API seems like a really natural way to adjust parameter values, since handlers that manipulate them could be added for either ITERATION_* or EPOCH_* events, and modifying multiple parameters at once (e.g. LR and momentum) would be straightforward too.

I wrote a short IPython notebook as a prototype of one way it could look to do this with the event API in a general way (plots are at the very bottom). I left most of the actual scheduler code in separate files for now to try and see if the idea is even worth it first. Would this be useful?

vfdev-5 commented 6 years ago

@sean-adler there was a discussion in pytorch issues on this and there is CosineAnnealing built-in in pytorch. However, maybe it could make sense to modify LR and momentum...

Speaking about techniques like SGDR, maybe it could be generalized to something like LR scheduler with restarts, like I implemented here.

HTH

sean-adler commented 6 years ago

Speaking about techniques like SGDR, maybe it could be generalized to something like LR scheduler with restarts, like I implemented here.

@vfdev-5 Nice! Yeah, that's pretty much exactly what I had in mind, except with batch-level adjustment too.

vfdev-5 commented 6 years ago

@sean-adler could you explain for me please why is it necessary to control at batch-level adjustment ?

sean-adler commented 6 years ago

@sean-adler could you explain for me please why is it necessary to control at batch-level adjustment ?

If I'm understanding the SGDR paper correctly, that's how they implement their LR schedule - the warm restarts occur at after some number of epochs, but the learning rate is updated after every batch.

Here's one section where they mention it in the paper:

vfdev-5 commented 6 years ago

@sean-adler thanks! Personally, I think a more useful lr schedule could be something like here:

"warmUp": "the learning rate starts at 0.1, then it increases linearly per batch for the first 4 epochs",
        "schedule": [
          {"learning_rate": 0.1, "epochs": 1},
          {"learning_rate": 0.2, "epochs": 1},
          {"learning_rate": 0.3, "epochs": 1},
          {"learning_rate": 0.4, "epochs": 17},
          {"learning_rate": 0.04, "epochs": 14},
          {"learning_rate": 0.004, "epochs": 8},
          {"learning_rate": 0.0004, "epochs": 3}

This possibility to set lr value and keep during a period...

sean-adler commented 6 years ago

I think a schedule like that could be implemented using the same base handler I was testing out, and the SGDR cosine annealing schedule (and other schedules like CLR) could be implemented using the same thing too. It seems useful to have some base functionality like this, but let me know what you think.

sean-adler commented 6 years ago

@vfdev-5 I opened a work-in-progress PR to try and make what I'm talking about more concrete (in terms of base handlers and other stuff like that). Hopefully it helps the discussion here!

vfdev-5 commented 6 years ago

@sean-adler yes, just saw it. Code looks clean and shows the power of event handlers :) but I think it goes in parallel with torch built-in _LRScheduler and I wonder what could we gain more with this proposed approach. Today I use pytorch built-in schedulers and a handler that just call step() on them it is OK... Could you, please, also compare your cosine annealing schedule and pytorch built-in one ?

sean-adler commented 6 years ago

Today I use pytorch built-in schedulers and a handler that just call step() on them it is OK...

True, that definitely works, but it doesn't let you adjust other parameters like momentum. That's pretty easy to do with these handlers though, since you can pass a parameter name e.g. 'momentum' as an argument.

Could you, please, also compare your cosine annealing schedule and pytorch built-in one ?

Sure! The version in that PR implements warm restarts, and the pytorch built-in one does not. Other than that, the schedule is the same.

vfdev-5 commented 6 years ago

Okay, I see, thanks. Just for my curiosity, how it is useful to change momentum and other parameters during training ? What do you call warm restarts ?

sean-adler commented 6 years ago

Just for my curiosity, how it is useful to change momentum and other parameters during training ?

I've only seen it done here. I have no idea if it will become more popular over time, but the results are interesting 😄

What do you call warm restarts ?

I think that's just how the SGDR authors describe the cyclicality of their LR schedule. In their paper, a "warm restart" is when the LR is reset to its initial value after decreasing for some number of epochs.

vfdev-5 commented 6 years ago

Just found this this article cited by you and wanted to read it :)

In their paper, a "warm restart"

Before asking you I tried to use ctrl+f in the article on the word "warm" in the article and it gave 0 occurences :)

sean-adler commented 6 years ago

Before asking you I tried to use ctrl+f in the article on the word "warm" in the article and it gave 0 occurences :)

Oh, sorry for the miscommunication, I meant that warm restarts come up in the SGDR paper.

alykhantejani commented 6 years ago

So as the majority of this functionality can be handled by the pytorch lr_scheduler, I don't think there's a need for a specific handler in ignite. Unless I'm missing something?

In general, we want to keep the universal handlers in ignite and make anything niche easy to implement for the user

sean-adler commented 6 years ago

So as the majority of this functionality can be handled by the pytorch lr_scheduler, I don't think there's a need for a specific handler in ignite. Unless I'm missing something?

My understanding is that the lr_scheduler module doesn't handle cycles or momentum. Anyone can implement cycles by subclassing _LRScheduler, but it'd require a fair amount of what's in #171 or something like it. And to my knowledge, scheduling momentum is not possible with _LRScheduler.

If this stuff isn't a common enough need, happy to close this out!

vfdev-5 commented 6 years ago

It was evoked here to introduce ignite.contrib module, so maybe we could put this code in ignite.contrib.handlers.

@sean-adler is it difficult to add a supplementary option to existing classes to achieve "warm-up" effect (for several first epochs optimize only some top-layers and then finetune all the others) like this:

OPTIM = Adam(
    params=[
        {"params": MODEL.features.parameters(), 'lr': 0.001},
        {"params": MODEL.classifier.parameters(), 'lr': 0.001},
        {"params": MODEL.final_classifiers.parameters(), 'lr': 0.001},
    ],
)

def lambda_lr_features(epoch):
    if epoch < 5:
        return 0.001
    else:
        return 0.1 * (0.75 ** (epoch - 3))

def lambda_lr_classifier(epoch):
    if epoch < 5:
        return 0.01
    else:
        return 0.75 ** (epoch - 3)

def lambda_lr_final_classifiers(epoch):
    if epoch < 5:
        return 1.0
    else:
        return 0.88 ** (epoch - 3)

LR_SCHEDULERS = [
    LambdaLR(OPTIM, lr_lambda=[lambda_lr_features, lambda_lr_classifier, lambda_lr_final_classifiers])
]

sean-adler commented 6 years ago

@sean-adler is it difficult to add a supplementary option to existing classes to achieve "warm-up" effect (for several first epochs optimize only some top-layers and then finetune all the others) like this:

@vfdev-5 specifying phases like that is a good idea, I'll try adding it!

vfdev-5 commented 6 years ago

@sean-adler this can be also interesting idea too

vfdev-5 commented 6 years ago

@sean-adler take a look at this article (maybe you have seen it) and

Remark 2: Apply momentum correction after changing learning rate if using (10)

It seems, one more ➕ for moment variation during the training

sean-adler commented 6 years ago

@sean-adler take a look at this article (maybe you have seen it)

I hadn't seen it! Very cool, thanks for posting it.

alykhantejani commented 6 years ago

My understanding is that the lr_scheduler module doesn't handle cycles or momentum. Anyone can implement cycles by subclassing _LRScheduler, but it'd require a fair amount of what's in #171 or something like it. And to my knowledge, scheduling momentum is not possible with _LRScheduler.

@sean-adler sorry for the delay on this. Do we think this might be a better change for the LRScheduler in Pytorch itself?

sean-adler commented 6 years ago

@sean-adler sorry for the delay on this. Do we think this might be a better change for the LRScheduler in Pytorch itself?

No problem! I've been super busy as well for the past couple of weeks. I'm not opposed to opening a PR there, it's up to you whether it would be a better match in that repo or in ignite. My perspective is that it's a lot easier to set up these schedules using the event/callback API in ignite, but maybe there is value in moving it to core.

vfdev-5 commented 6 years ago

@sean-adler I would interested as well by implementation of this schedulers : HillLearningRate and CompositeLearningRate

@alykhantejani what do you think about to introduce a contrib or extensions module with stuff like schedulers, hyperopt, maybe training config files, stuff that is not essential but could be helpful ?

alykhantejani commented 6 years ago

I think contrib makes sense, would be nice to have community addition (maintained by the community). But I think we should strive to keep the core ignite lib lean

vfdev-5 commented 6 years ago

Interesting link on how fast.ai schedules LR and optimizers and its params : https://github.com/fastai/fastai/blob/master/courses/dl2/training_phase.ipynb

sampathweb commented 6 years ago

Parameter Scheduling and providing a List of Training Phases the way Fast.AI does would go a long way in making the trainer generally applicable to most problems. Would be great to have it in the core. But contrib would also work.

sean-adler commented 6 years ago

@vfdev-5 @sampathweb for sure, fastai's training phase API is a lot more flexible than both PyTorch core's LR scheduling and this PR's approach.

One (maybe subtle) difference it has compared to the current engine API in ignite is that you end up specifying the number of epochs in the schedule itself, instead of passing that value to Engine.run(). Probably not super significant, but it's an interesting example of how explicit those types of schedules are.

(FWIW, I think the engine API in ignite is much more elegant than the callback API in fastai, but their approach to parameter scheduling is better than anything else I've seen.)

amitibo commented 6 years ago

@sean-adler @vfdev-5 concatenating several schedulers together, as done by fastai can also be implemented using the framework suggested in #171:

class ConcatScheduler(ParamScheduler):
    """Concat a list of Schedulers.

    Args:
        ...
        schedulers_list (list): List of (scheduler_cls, scheduler_kwds, duration).
        ...
    Note:
        The concatenated schedulers inherit the optimizer and param_name
        from the the concatenating scheduler. 
    """
    def __init__(self,
                 optimizer,
                 param_name,
                 schedulers_list,
                 save_history=False):
        super(ConcatScheduler, self).__init__(optimizer, param_name, save_history=save_history)
        self._schedulers_list = schedulers_list
        self._schedulers_index = 0
        self._next_scheduler_switch = 0

    def _next_scheduler(self):
        scheduler_cls, scheduler_kwds, self._next_scheduler_switch = \
            self._schedulers_list[self._schedulers_index]

        kwds = scheduler_kwds.copy()
        kwds.update(
            dict(
                optimizer=self.optimizer,
                param_name=self.param_name,
                save_history=self.save_history
            )
        )

        self._scheduler = scheduler_cls(**kwds)
        self._schedulers_index = (self._schedulers_index + 1) % len(self._schedulers_list)

    def __call__(self, engine):
        if self._next_scheduler_switch is not None:
            self._next_scheduler_switch -= 1
            if self._next_scheduler_switch < 0:
                self._next_scheduler()

        return self._scheduler(engine)

Scheduling a cyclic phase followed by a linear phase can be done like;

scheduler = ConcatScheduler(
    optimizer=optimizer,
    param_name='lr',
    schedulers_list=[
        (
            CosineScheduler,
            dict(
                start_value=lr_start,
                end_value=lr_end,
                cycle_size=lr_cycle
            ),
            phase1_duration
        ),
        (
            LinearScheduler,
            dict(
                start_value=lr_start,
                end_value=lr_start*0.01,
                cycle_size=lr_cycle*2
            ),
            None
        )
    ],
    save_history=True
)

I am using this parameter scheduler and can give a hand in implementing it.

vfdev-5 commented 6 years ago

@amitibo thanks for the proposition! For instance we are not yet intergrated ParamScheduler into contrib module. IMO it would be great to have ConcatScheduler in the contrib module.

@sean-adler could you please send a PR with ParamScheduler in contrib ? Tell me please if you are busy at the moment and I can copy it by myself.

amitibo commented 6 years ago

@vfdev-5 and @sean-adler, if no body is currently working on this issue, I can prepare a PR.

vfdev-5 commented 6 years ago

@amitibo are you talking about ConcatScheduler ? If yes, this would be good. Let me a bit of time to merge ParamScheduler and your PR will be more than welcomed. Before this you can implement starting from the branch of Sean Adler

vfdev-5 commented 6 years ago

@amitibo we merged ParamScheduler into master, so you can start a PR from the master.

pytorch / ignite

[Feature Request] Parameter scheduling #169