m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

Training CLI script prompts for unused options which `MultiStepLR` scheduler does not accept #90

Closed raehik closed 9 months ago

raehik commented 1 year ago

The neural net's learning rate is configurable in the training step using the CLI parameter --learning_rate <string>, where <string> is a list of epoch-rate pairs in format epoch_1/rate_1/epoch_2/rate_2/.... However, upon some investigation, only the first rate appears to be used. I find it used only during optimizer initialization:

https://github.com/m2lines/gz21_ocean_momentum/blob/322ee2ffce9227573701e8d57422dbe1755dea38/src/gz21_ocean_momentum/trainScript.py#L345-L346

Note that due to how MultiStepLR is configured, all of the training step run configurations I have will run as expected anyway due to how the rates have been decided (they decay by a factor of 10 each step). e.g. for 0/5e-4/15/5e-5/30/5e-6, rates past 5e-4 are ignored, but the unused rates end up being the calculated ones anyway due to the decay factor.

Is this erroneous; should we be able to specify learning rate exactly? Or can we simplify: pass an initial rate, a decay factor, and the epochs to decay at. (This appears to be the pattern MultiStepLR uses.)

raehik commented 1 year ago

ping @arthurBarthe

arthurBarthe commented 1 year ago

Discussed yesterday, but adding notes to sum up: to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates. We can achieve this using the MultiStepLR from pytorch. I am happy to take care of that in a new branch from the main one if that sounds reasonable.

raehik commented 1 year ago

to reproduce the paper, we should not be using a scheduler, but instead the fixed learning rate updates

MultiStepLR is a scheduler, we are currently using it, and I believe it was used to decay learning rate for the paper model. This issue was regarding an apparent disconnect between the CLI and how the options get used (as it turns out, some options aren't used). We should change the CLI options so you don't have to provide values that don't get used. (Or could simply update help strings as a stopgap.)

Would be happy to review any work you do related to this, feel free to assign/ping/message me. I have some work too, but it's larger scale & I'm in the middle of migrating it from another PR.

arthurBarthe commented 1 year ago

Oh I missed that we are currently using MultiStepLR already! Then we only need to provide the initial learning rate, the decay factor (0.1) and the milestones. And we can indeed get rid of the unnecessary extra values. Does that make sense?

raehik commented 1 year ago

Right. The two-line snippet above has everything interesting. The decay factor is restricted (not configurable via CLI), the initial learning rate & milestones are recovered from the "learning rate" string. Those should be their own options instead.

raehik commented 10 months ago

handled in #95