Open ocramz opened 6 years ago
More importantly, I think people are interested in adagrad and other sketched versions. But I haven't seen any matrix sketching code in hackage.
@freuk Adagrad, that's a good one. I read the paper a couple years back. This is basically new ground in Haskell, we'll have to write it ourselves
also see implementations for OGD, nesterov acceleration, adam, adamax at https://github.com/mstksg/opto/blob/master/src/Numeric/Opto/Optimizer.hs
Yep, thanks @freuk , I've added some references to the list above
[ ] ADAGRAD http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
[ ] ADAM ("ADAptive Moment estimation"), ADAMAX
[ ] Nesterov accelerated gradient descent:
original paper:
simplified formulation : I. Sutskever, Training Recurrent Neural Networks , Ph.D. thesis, CS Dept., U. Toronto, 2012
[ ] Online gradient descent