Why the fed the step-level information to the LinearWarmupCosineAnnealingLR, instead of the epoch-level information ?

HuangChiEn commented 2 years ago

First, thanks for this amazing github repo 👍

I have a question about the implementation of learning rate scheduler. I have dig into the document of LinearWarmupCosineAnnealingLR scheduler in pytorchlightning bolt package. Most of the parameters should be the epoch-level information, such as warmup"epochs", max_"epochs". I'm wonder why the code snippet in base.py fed the step-level information to the scheduler ? (as the follows code snippet)

# code snippet in base.py
... 
if self.scheduler == "warmup_cosine":
            scheduler = {
                "scheduler": LinearWarmupCosineAnnealingLR(
                    optimizer,
                    # suppose we have 10 warmup epoch as SimCLR..
                    # why we multiply total step of epoch, ex. 10 * 317 (step) = 3170 step per epoch 
                    # can't we just fed self.warmup_epochs ?
                    #  warmup_epochs=10 ? 
                    warmup_epochs=self.warmup_epochs * self.num_training_steps,
                    max_epochs=self.max_epochs * self.num_training_steps,
                    warmup_start_lr=self.warmup_start_lr if self.warmup_epochs > 0 else self.lr,
                    eta_min=self.min_lr,
                ),
                "interval": self.scheduler_interval,
                "frequency": 1,
            }

Any suggestion will be appreciated!!

vturrisi commented 2 years ago

Hi! You can control the scheduler step to be either per batch or per epoch with the interval parameter. We default to the first, since for linear warmup, you don't waste a full epoch with a very small learning rate. If you do that, you need to adjust the other parameters accordingly.

HuangChiEn commented 2 years ago

Hi! You can control the scheduler step to be either per batch or per epoch with the interval parameter. We default to the first, since for linear warmup, you don't waste a full epoch with a very small learning rate. If you do that, you need to adjust the other parameters accordingly.

TKS for your reply. So, does the code snippet performs the step-level warmup instead of epoch-level in the first case (for saving the training time) ?

vturrisi commented 2 years ago

Yes, but this is not related to training time.

vturrisi / solo-learn

Why the fed the step-level information to the LinearWarmupCosineAnnealingLR, instead of the epoch-level information ? #264