About the setup of LinearWarmupCosineAnnealingLR, seems no different between the step-level or epoch-level information

HuangChiEn commented 2 years ago

First, thanks for this amazing github repo 👍 However, I want to ask some questions related with the closed issue #264 . Suppose we enable the warmup_cosine scheduler, the default 10 warmup-epoch is defined in the base.py.

The code snippet implements the LinearWarmupCosineAnnealingLR setup,

# code snippet in base.py
... 
if self.scheduler == "warmup_cosine":
            scheduler = {
                "scheduler": LinearWarmupCosineAnnealingLR(
                    optimizer,
                    # suppose we have 10 warmup epoch as SimCLR..
                    warmup_epochs=10 * self.num_training_steps,
                    max_epochs=self.max_epochs * self.num_training_steps,
                    warmup_start_lr=self.warmup_start_lr if self.warmup_epochs > 0 else self.lr,
                    eta_min=self.min_lr,
                ),
                "interval": self.scheduler_interval,  # step-level, instead of epoch-level
                "frequency": 1,
            }

In the aforementioned code snippet, why don't we just setup warmup_epochs=10, instead we do the following modification :

change the interval to step-level
calculate the number of step for each epoch

suppose 40 step per epoch
apply the step-level setup with exactly epoch-level setting (s.t. 10 epoch)

warmup_step=10*40 (since interval is step), which equivalent warmup_epoch=10 (while interval is epoch)

I can't figure out why we did the above setup with the default action is still perform the exactly 10 "epoch" warmup finally.

My guess : some situation, maybe we can setup "397 step" warmup, which is close to 10 epoch, but not "exactly" equivalent to some epoch. So, we need to manually setup 397 instead ?

Thank you for taking a look, any suggestion will be appreciated ~

DonkeyShot21 commented 2 years ago

The result is similar but not exactly the same. With the step setting the lr is updated at every step, while with the epoch setting it is increased only in-between epochs and then kept constant for the whole epoch. We found the step setting slightly improves the performance and also makes some methods more stable.

HuangChiEn commented 2 years ago

Oh.. thanks for your explanation, the reason is a bit out of my suppose, it's great to understand the detail implementation behind the solo-learn 👍

vturrisi / solo-learn

About the setup of LinearWarmupCosineAnnealingLR, seems no different between the step-level or epoch-level information #271