Closed raehik closed 9 months ago
ping @arthurBarthe
Discussed yesterday, but adding notes to sum up: to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates. We can achieve this using the MultiStepLR from pytorch. I am happy to take care of that in a new branch from the main one if that sounds reasonable.
to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates
MultiStepLR
is a scheduler, we are currently using it, and I believe it was used to decay learning rate for the paper model. This issue was regarding an apparent disconnect between the CLI and how the options get used (as it turns out, some options aren't used). We should change the CLI options so you don't have to provide values that don't get used. (Or could simply update help strings as a stopgap.)
Would be happy to review any work you do related to this, feel free to assign/ping/message me. I have some work too, but it's larger scale & I'm in the middle of migrating it from another PR.
Oh I missed that we are currently using MultiStepLR already! Then we only need to provide the initial learning rate, the decay factor (0.1) and the milestones. And we can indeed get rid of the unnecessary extra values. Does that make sense?
Right. The two-line snippet above has everything interesting. The decay factor is restricted (not configurable via CLI), the initial learning rate & milestones are recovered from the "learning rate" string. Those should be their own options instead.
handled in #95
The neural net's learning rate is configurable in the training step using the CLI parameter
--learning_rate <string>
, where<string>
is a list of epoch-rate pairs in formatepoch_1/rate_1/epoch_2/rate_2/...
. However, upon some investigation, only the first rate appears to be used. I find it used only during optimizer initialization:https://github.com/m2lines/gz21_ocean_momentum/blob/322ee2ffce9227573701e8d57422dbe1755dea38/src/gz21_ocean_momentum/trainScript.py#L345-L346
Note that due to how
MultiStepLR
is configured, all of the training step run configurations I have will run as expected anyway due to how the rates have been decided (they decay by a factor of 10 each step). e.g. for0/5e-4/15/5e-5/30/5e-6
, rates past5e-4
are ignored, but the unused rates end up being the calculated ones anyway due to the decay factor.Is this erroneous; should we be able to specify learning rate exactly? Or can we simplify: pass an initial rate, a decay factor, and the epochs to decay at. (This appears to be the pattern
MultiStepLR
uses.)