Closed murphyk closed 1 year ago
H. Luo, A. Agarwal, N. Cesa-Bianchi, and J. Langford, “Efficient Second Order Online Learning by Sketching,” in NIPS, Feb. 2016 [Online]. Available: https://proceedings.neurips.cc/paper/2016/hash/15de21c670ae7c3f6f3f1f37029303c9-Abstract.html (edited)
L. Aitchison, “Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods,” in NIPS, 2020 [Online]. Available: https://proceedings.neurips.cc/paper/2020/file/d33174c464c877fb03e77efdab4ae804-Paper.pdf. [Accessed: Sep. 13, 2022]
J. Martens, “New insights and perspectives on the natural gradient method,” J. Mach. Learn. Res., vol. 21, no. 1, pp. 5776–5851, Jan. 2020 [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3455716.3455862
M. A. Skoglund, G. Hendeby, and D. Axehill, “Extended Kalman filter modifications based on an optimization View Point,” in 18th International Conference on Information Fusion, 2015 [Online]. Available: https://c4i.gmu.edu/~pcosta/F15/data/fileserver/file/472081/filename/Paper_1570113717.pdf. [Accessed: Mar. 27, 2023]
https://github.com/tensorflow/tensorflow/blob/v1.13.2/tensorflow/contrib/opt/python/training/ggt.py
N. Agarwal et al., “Efficient Full-Matrix Adaptive Regularization,” in ICML, 09--15 Jun 2019, vol. 97, pp. 102–110 [Online]. Available: http://proceedings.mlr.press/v97/agarwal19b.html
R. Anil, V. Gupta, T. Koren, K. Regan, and Y. Singer, “Scalable Second Order Optimization for Deep Learning,” arXiv [cs.LG], Feb. 20, 2020 [Online]. Available: http://arxiv.org/abs/2002.09018
https://github.com/google-research/google-research/tree/master/scalable_shampoo