Open dustinvtran opened 9 years ago
I completely agree. Same for L-BFGS --- it should be abstracted from the application as much as possible for re-use.
Is there SGD in ADVI now? There shouldn't be given that there's no way for users to run it yet --- things should live on branches until they're ready to go.
We made the mistake of putting in higher-order autodiff and discrete sampling infrastructure before either were ready and it's just been a huge burden.
Thanks for the refs!
On Jul 31, 2015, at 3:23 PM, Dustin Tran notifications@github.com wrote:
It would be worth having something generic for all things related to stochastic gradient descent, to be separated from variational inference itself. E.g., a sgd class to have different stochastic gradient methods available, a learning rate class for testing various learning rates, a subsampling class, etc. This will eventually be necessary as we start working on more research tracks, e.g., Mandt and Blei (2014), Theis and Hoffman (2015), Tran et al. (2015).
It should also be applicable for computing the penalized MLE, so that the optimization interface of Stan also has SGD available for users.
— Reply to this email directly or view it on GitHub.
Branch from feature/issue-1751-service-methods if you're going to work on this soon.
It would be worth having something generic for all things related to stochastic approximations, to be separated from variational inference itself. E.g., a sgd class to have different stochastic gradient methods available, a learning rate class for testing various learning rates, a subsampling class, etc. This will eventually be necessary as we start working on more research tracks, e.g., Mandt and Blei (2014), Theis and Hoffman (2015), Tran et al. (2015).
It should also be applicable for computing the penalized MLE, so that the optimization interface of Stan also has SGD available for users.