Yesterday's PR accidentally removed the dynamic updates to learning rate and momentum for the SGD and Nesterov solvers - this puts them back.
There are also some Adam improvements: it can also benefit from the LRPolicy, so this is added. I also found that it was useful to serialize the solver state after all, if I restarted after optimization and reset the moment estimates the solver often diverged. This version saves the whole solver state (at the cost of making the snapshots 3x bigger when Adam is used).
Yesterday's PR accidentally removed the dynamic updates to learning rate and momentum for the SGD and Nesterov solvers - this puts them back.
There are also some Adam improvements: it can also benefit from the LRPolicy, so this is added. I also found that it was useful to serialize the solver state after all, if I restarted after optimization and reset the moment estimates the solver often diverged. This version saves the whole solver state (at the cost of making the snapshots 3x bigger when Adam is used).