Open cyh767 opened 4 years ago
When running your code with gamma=10
, we see the loss eventually becoming nan
, so the learning is indeed affecting the stepping. Have you observed convergence with different models?
Thank you for your reply. I agree that the stepping is applied: I have observed near-convergence with more complex models; I think the model will converge with more iterations. But similarly, it seems the model is not learning with the changed learning rate at the designated step. It's similar to the example above: the learning rate changed at iteration: 5
, but the loss at that step is the same as the loss using a fixed learning rate.
Moreover, I have found a clearer case today. With the same codes and the same steps, I set initial_learning_rate = 10
and gamma = 0.1
in both main_steplr.py
and main_fixlr.py
, with the other settings not changed. I see the following behavior:
python main_steplr.py
; this time the setting is initial_learning_rate = 10
, step_size=5
and gamma=0.1
with a torch.optim.lr_scheduler.StepLR
. The logs are as follows on my computer:iteration: 0 learning rate: 10 loss = 0.23123088479042053 iteration: 1
learning rate: 10 loss = 0.11434487253427505 iteration: 2 learning rate: 10 loss = 0.11738215386867523 iteration: 3 learning rate: 10 loss = 0.11591885983943939 iteration: 4 learning rate: 10 loss = 0.11567183583974838 iteration: 5 learning rate: 1.0 loss = 0.11595271527767181 iteration: 6 learning rate: 1.0 loss = 0.1214301735162735 iteration: 7 learning rate: 1.0 loss = 0.1200874075293541 iteration: 8 learning rate: 1.0 loss = 0.1178327202796936 iteration: 9 learning rate: 1.0 loss = 0.1179950013756752 iteration: 10 learning rate: 0.1 loss = 0.11520280689001083 iteration: 11 learning rate: 0.1 loss = 0.12716393172740936 iteration: 12 learning rate: 0.1 loss = 0.12422306835651398 iteration: 13 learning rate: 0.1 loss = 0.13028551638126373 iteration: 14 learning rate: 0.1 loss = 0.12836600840091705 iteration: 15 learning rate: 0.010000000000000002 loss = 0.11887713521718979 iteration: 16 learning rate: 0.010000000000000002 loss = 0.1185910627245903 iteration: 17 learning rate: 0.010000000000000002 loss = 0.11371660977602005 iteration: 18 learning rate: 0.010000000000000002 loss = 0.10454636812210083 iteration: 19 learning rate: 0.010000000000000002 loss = 0.12165459990501404 iteration: 20 learning rate: 0.0010000000000000002 loss = 0.11898376792669296 iteration: 21 learning rate: 0.0010000000000000002 loss = 0.1097453162074089
python main_fixlr.py
; this applies a fixed initial_learning_rate = 10
. The logs are:iteration: 0 learning rate: 10 loss = 0.23123088479042053 iteration: 1
learning rate: 10 loss = 0.11434487253427505 iteration: 2 learning rate: 10 loss = 0.11738215386867523 iteration: 3 learning rate: 10 loss = 0.11591885983943939 iteration: 4 learning rate: 10 loss = 0.11567183583974838 iteration: 5 learning rate: 10 loss = 0.11595271527767181 iteration: 6 learning rate: 10 loss = 0.1214301735162735 iteration: 7 learning rate: 10 loss = 0.1200874075293541 iteration: 8 learning rate: 10 loss = 0.1178327202796936 iteration: 9 learning rate: 10 loss = 0.1179950013756752 iteration: 10 learning rate: 10 loss = 0.11520280689001083 iteration: 11 learning rate: 10 loss = 0.12716393172740936 iteration: 12 learning rate: 10 loss = 0.12422306835651398 iteration: 13 learning rate: 10 loss = 0.13030129671096802 iteration: 14 learning rate: 10 loss = 0.12838180363178253 iteration: 15 learning rate: 10 loss = 0.11887713521718979 iteration: 16 learning rate: 10 loss = 0.1185910627245903 iteration: 17 learning rate: 10 loss = 0.11371660977602005 iteration: 18 learning rate: 10 loss = 0.10454636812210083 iteration: 19 learning rate: 10 loss = 0.12169446051120758 iteration: 20 learning rate: 10 loss = 0.11902362108230591 iteration: 21 learning rate: 10 loss = 0.1097453162074089
The losses are the same till iteration: 18
, although the learning rate changes with step_size=5
in step 1.
π Bug
With a scheduler, the learning rate changes at the designated iteration, but it seems the same iteration still applies the learning rate before change.
To Reproduce
A minimal example is attached: test_lr.zip
Steps to reproduce the behavior:
python main_steplr.py
; this applies atorch.optim.lr_scheduler.StepLR
withstep_size=5
andgamma=0.1
. The logs on my computer are as follows:python main_fixlr.py
; this applies a fixed learning rate. The logs on my computer are as follows:Expected behavior
The above steps compare between a learning rate decreased at iteration 5 and fixing this learning rate.
In step 1, according the documentation, the learning rate should be
0.1
ifiteration < 5
, and be0.01
if5 <= iteration < 10
. However, although learning rate changes in iteration 5 in step 1, the loss is the same in the iteration 5 of step 1 and step 2. In other words, in iteration 5, different learning rates lead to the same loss. I think the changed learn rate might not be correctly applied in the designated iteration in step 1.Environment
PyTorch version: 1.5.0 Is debug build: No CUDA used to build PyTorch: 10.2
OS: Microsoft Windows 10 δΈδΈη (Pro) GCC version: (Rev5, Built by MSYS2 project) 5.3.0 CMake version: version 3.15.5
Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.2.89 GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudnn64_7.dll
Versions of relevant libraries: [pip] numpy==1.18.1 [pip] numpydoc==0.9.1 [pip] torch==1.5.0 [pip] torchvision==0.6.0 [conda] blas 1.0 mkl defaults [conda] mkl 2019.4 245 defaults [conda] mkl-service 2.3.0 py37hb782905_0 defaults [conda] mkl_fft 1.0.14 py37h14836fe_0 defaults [conda] mkl_random 1.1.0 py37h675688f_0 defaults [conda] pytorch 1.5.0 py3.7_cuda102_cudnn7_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch [conda] torchvision 0.6.0 py37_cu102 https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch
cc @vincentqb