So here it is :-) This Adam code corresponds to the latest version of the Adam paper (i.e. the parameters beta1 and beta2 have been inverted compared to the adam version in this repository). The default values for beta1, beta2 and epsilon are the recommended ones. This is in line with other Adam implementations.
Another change is that the learning rate parameter for Adam is now controlled by the learning rate in options.
So here it is :-) This Adam code corresponds to the latest version of the Adam paper (i.e. the parameters beta1 and beta2 have been inverted compared to the adam version in this repository). The default values for beta1, beta2 and epsilon are the recommended ones. This is in line with other Adam implementations.
Another change is that the learning rate parameter for Adam is now controlled by the learning rate in options.