Closed pmeier closed 3 years ago
The learning_rate
is the same in the first two epochs and is only reduced in the third. This is because lr_scheduler.step()
is only called after the epoch, so the learning_rate
in your example is only valid for the next epoch.
However, this should be described in more detail in the docs of the DelayedExponentialLR
so that there are no misunderstandings or what do you think?
from torch import optim, nn
import pystiche_papers.ulyanov_et_al_2016 as paper
model = nn.Conv2d(3, 3, 1)
optimizer = optim.Adam(model.parameters(), lr=1e0)
delay = 2
lr_scheduler = paper.DelayedExponentialLR(optimizer, gamma=0.1, delay=delay)
for epoch in range(1, 6):
print(f"Learning rate for epoch {epoch}: {optimizer.param_groups[0]['lr']:.0e}")
lr_scheduler.step()
Learning rate for epoch 1: 1e+00
Learning rate for epoch 2: 1e+00
Learning rate for epoch 3: 1e-01
Learning rate for epoch 4: 1e-02
Learning rate for epoch 5: 1e-03
Consider the following snippet:
I expected that the first two epochs see no decay at all. It seems we starting the decay one epoch too early.