Port algorithms from `vlearn`

ocramz commented 6 years ago

[ ] Normalized Adaptive Gradient

@article{DBLP:journals/corr/abs-1305-6646,
  author    = {St{\'{e}}phane Ross and
               Paul Mineiro and
               John Langford},
  title     = {Normalized Online Learning},
  journal   = {CoRR},
  volume    = {abs/1305.6646},
  year      = {2013},
  url       = {http://arxiv.org/abs/1305.6646},
  timestamp = {Sun, 02 Jun 2013 20:48:21 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1305-6646},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

[ ] Generalized Online Mirror Descent

@article{DBLP:journals/corr/abs-1304-2994,
  author    = {Francesco Orabona and
               Koby Crammer and
               Nicol{\`{o}} Cesa{-}Bianchi},
  title     = {A Generalized Online Mirror Descent with Applications to Classification
               and Regression},
  journal   = {CoRR},
  volume    = {abs/1304.2994},
  year      = {2013},
  url       = {http://arxiv.org/abs/1304.2994},
  timestamp = {Thu, 02 May 2013 15:54:11 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1304-2994},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}

[ ] Online Convex Programming and Generalized Infinitesimal Gradient Ascent

@MISC{Zinkevich03onlineconvex,
    author = {Martin Zinkevich},
    title = {Online Convex Programming and Generalized Infinitesimal Gradient Ascent},
    year = {2003}
}

[ ] ADAGRAD http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

[ ] ADAM ("ADAptive Moment estimation"), ADAMAX

paper:

@article{Adam14,
author    = {Diederik Kingma and Jimmy Ba},
title     = {Adam: A method for stochastic optimization},
year      = {2014},
url       = {https://arxiv.org/abs/1412.6980} }

[ ] Nesterov accelerated gradient descent:

original paper:

@Article{Nesterov83, 
author = {Yurii Nesterov},
title = {A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2)},
journal = {Doklady AN SSSR (translated as Soviet. Math. Docl.)}
year = {1983}
}

simplified formulation : I. Sutskever, Training Recurrent Neural Networks , Ph.D. thesis, CS Dept., U. Toronto, 2012

[ ] Online gradient descent
- paper : Adaptive online gradient descent : https://papers.nips.cc/paper/3319-adaptive-online-gradient-descent

freuk commented 6 years ago

More importantly, I think people are interested in adagrad and other sketched versions. But I haven't seen any matrix sketching code in hackage.

ocramz commented 6 years ago

@freuk Adagrad, that's a good one. I read the paper a couple years back. This is basically new ground in Haskell, we'll have to write it ourselves

freuk commented 6 years ago

also see implementations for OGD, nesterov acceleration, adam, adamax at https://github.com/mstksg/opto/blob/master/src/Numeric/Opto/Optimizer.hs

ocramz commented 6 years ago

Yep, thanks @freuk , I've added some references to the list above

ocramz / optimization-streaming

Port algorithms from `vlearn` #2