pystiche / papers

Reference implementation and replication of prominent NST papers
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

DelayedExponentialLR from ulyanov_et_al_2016 starts to early #220

Closed pmeier closed 3 years ago

pmeier commented 3 years ago

Consider the following snippet:

from torch import optim, nn
import pystiche_papers.ulyanov_et_al_2016 as paper

model = nn.Conv2d(3, 3, 1)
optimizer = optim.Adam(model.parameters(), lr=1e0)

delay = 2
lr_scheduler = paper.DelayedExponentialLR(optimizer, gamma=0.1, delay=delay)

for epoch in range(1, 6):
    lr_scheduler.step()
    print(f"{epoch}: {optimizer.param_groups[0]['lr']:.0e}")
1: 1e+00
2: 1e-01
3: 1e-02
4: 1e-03
5: 1e-04

I expected that the first two epochs see no decay at all. It seems we starting the decay one epoch too early.

jbueltemeier commented 3 years ago

The learning_rate is the same in the first two epochs and is only reduced in the third. This is because lr_scheduler.step() is only called after the epoch, so the learning_rate in your example is only valid for the next epoch.

However, this should be described in more detail in the docs of the DelayedExponentialLR so that there are no misunderstandings or what do you think?

pmeier commented 3 years ago
from torch import optim, nn
import pystiche_papers.ulyanov_et_al_2016 as paper

model = nn.Conv2d(3, 3, 1)
optimizer = optim.Adam(model.parameters(), lr=1e0)

delay = 2
lr_scheduler = paper.DelayedExponentialLR(optimizer, gamma=0.1, delay=delay)

for epoch in range(1, 6):
    print(f"Learning rate for epoch {epoch}: {optimizer.param_groups[0]['lr']:.0e}")
    lr_scheduler.step()
Learning rate for epoch 1: 1e+00
Learning rate for epoch 2: 1e+00
Learning rate for epoch 3: 1e-01
Learning rate for epoch 4: 1e-02
Learning rate for epoch 5: 1e-03