modern-fortran / neural-fortran

A parallel framework for deep learning
MIT License
395 stars 82 forks source link

Adagrad Optimizer Implementation #154

Closed Spnetic-5 closed 1 year ago

Spnetic-5 commented 1 year ago

Reference: PyTorch Docs

milancurcic commented 1 year ago

Thanks @Spnetic-5. I believe it's correct now. In your original implementation, the L2 regularization was not accounted for in the accumulation of the squared gradients because you applied it later in the param update. The learning rate decay was also doubly accounted for because in each step the learning rate should be amortized relative to the original learning rate, not the one from the previous step. Subtle differences that weren't caught in the tests.

I'll go ahead and merge, please release v0.15.0 when you get a chance.