pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.59k stars 1.98k forks source link

GSoC: contributing to pymc3 #1899

Closed asifzubair closed 7 years ago

asifzubair commented 7 years ago

Hi Folks,

I was really interested in some of the ideas that were mentioned on the ideas page. In particular, these seemed interesting:

I'm a graduate student in computational biology and have used bayesian methods for problems in developmental biology and genetics. In particular, we used a modification of the population MCMC sampler from Calderhead and Girolami for comparing network hypotheses. This requires the solving of a PDE for which I implemented the solver. However, I really feel that the implementation of ODE solvers will help adoption of bayesian methods for pharmacokinetic models. Also, since I was familiar with the previous work of Calderhead, I wanted to explore it more. However, I am not familiar with what topics have been taken.

I'm also hoping to make a long-term contribution to pymc3 and thought that this might be a good way to get started.

Thanks! Asif

CC-ing mentors: @twiecki @fonnesbeck

twiecki commented 7 years ago

Hi @asifzubair!

There has been interest expressed in RMHMC and VI, so if you are already interested in ODE, it seems like the best option.

asifzubair commented 7 years ago

Thank you, @twiecki .

I feel the ODE solvers are great and especially important in the area that I work in. However, I'd really like to work on sampling strategies and would it be okay if I started to look into integrating the emcee sampler . I actually did, previously, look at emcee's code base for their parallel tempered sampler and this will give me a chance to examine their implementation more closely.

Please let me know if this idea is still available.

twiecki commented 7 years ago

@asifzubair We actually already have a promising PR for this: https://github.com/pymc-devs/pymc3/pull/1689

asifzubair commented 7 years ago

Oh okay. Thank you, @twiecki .

I now notice that that topic has been removed but there now is something on SGHMC . Would it be okay if I took a stab at that ?

Thanks!

twiecki commented 7 years ago

@asifzubair Absolutely. Here is a paper to get you started: http://aad.informatik.uni-freiburg.de/papers/16-NIPS-BOHamiANN.pdf

twiecki commented 7 years ago

I think @jsalvatier also had some clever ideas on scalable HMC.

asifzubair commented 7 years ago

Thank you, @twiecki . I'll also have a look at the paper mentioned on the ideas page - by Chen et al.

twiecki commented 7 years ago

Make sure to submit your proposal to Google.

On Fri, Mar 17, 2017 at 7:45 AM, Asif Zubair notifications@github.com wrote:

Thank you, @twiecki https://github.com/twiecki . I'll also have a look at the paper mentioned on the ideas page - by Chen et al.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1899#issuecomment-287279684, or mute the thread https://github.com/notifications/unsubscribe-auth/AApJmITynqoR43XHiL7f-rSeMRhOSIlhks5rmivsgaJpZM4MdPFK .

asifzubair commented 7 years ago

Yes, thank you @twiecki . I've read through a couple of papers and should have a proposal ready in time for the deadline.

asifzubair commented 7 years ago

Hi @twiecki . I wrote an abstract for the project and am pasting it below. Could you please have a look at it and let me know if it reads well ? The [?] are references that I will add later.

Also, if you, @jsalvatier or @fonnesbeck have any recommendations for the implementation details that I could include in my proposal, I would really appreciate it.

Thank you.

Abstract

Bayesian approach offers an intuitive procedure for doing inference for statistical models. PyMC3 [?] provides a user-friendly framework for probabilistic programing (PP) in which Bayesian inference can be conducted. Central to this approach is the Bayes rule [?] which allows one to compute parameter posteriors for statistical models. However, closed form solutions exist only for the simplest of models and most real world examples require a sampling approach. The earliest approach for sampling, the Metropolis-Hastings algorithm [?], originates from physics-based models and has been judged one of the top ten algorithms of the 20th century [?].

Despite its initial success, the Metropolois-Hastings algorithm showed prohibitive convergence times, especially as the complexity of the models being implemented has increased. Innovations were again drawn from the realm of physics with the Hamiltonian Monte Carlo [?] (HMC) sampler. Briefly, the HMC sampler applies a deterministic proposal Metropolis method to the augmented target. In essence, this amounts to solving the equations of Hamiltonian dynamics which require the computation of a gradient of the potential energy term. It is by taking advantage of this gradient, that the convergence properties can be considerably improved. The No U-Turn Sampler (NUTS) is an improvement over the HMC with self-tuning properties. Both HMC and NUTS approaches are implemented in PyMC3, which along with Stan [?] and R's LaplacesDemon package are the only PP packages to offer these samplers.

However, the advantage of using the gradient information of the likelihood has also been diminished by the large volume of data being analysed. This is because the gradient now needs to be computed over a large space leading to expensive computations. Given the success of stochastic gradient approaches [?] in applications of big data analysis, it was considered that a noisy estimate of the gradient could be used instead of computing the exact gradient. This would alleviate the computational costs while maintaining favorable convergence properties. However, Chen et al. [?] showed that a naive implementation of a stochastic gradient would lead to loss of invariance property of HMC and thus erroneous conclusions. Instead, they suggest that one could apply a friction term to the Hamiltonian dynamics along with a stochastic gradient to recover the invariance property and thus obtain samples from the true target density. It is this sampling approach that we propose to implement as part of the google summer of code project.

The SGHMC sampler will greatly enable pyMC3 to be scalable and also the application of bayesian models to machine learning problems. Indeed, the authors of the original paper [?] showed how such an approach would be advantageous in probabilistic matrix factorization problems. Similar approaches have also been used in Bayesian optimization [?] for the tuning of deep neural networks [?]. We strongly feel that in light of the advantages that SGHMC presents, its addition to the pyMC3 repertoire will provide a huge advantage to the science and technology community.


springcoil commented 7 years ago

Good wwork but - I would put PyMC3 not pyMC3.

https://peerj.com/articles/cs-55/ is the PyMC3 paper Stigler, Stephen M. https://en.wikipedia.org/wiki/Stephen_Stigler "Thomas Bayes's Bayesian Inference," http://www.jstor.org/stable/2981538 Journal of the Royal Statistical Society https://en.wikipedia.org/wiki/Journal_of_the_Royal_Statistical_Society, Series A, 145:250–258, 1982. is Bayesian inference discussed in general

http://mc-stan.org/citations/ has links to various papers on Stan.

Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76(1). DOI 10.18637/jss.v076.i01 http://dx.doi.org/10.18637/jss.v076.i01

http://dx.doi.org/10.18637/jss.v076.i01

Betancourt, Michael. "A Conceptual Introduction to Hamiltonian Monte Carlo" https://arxiv.org/pdf/1701.02434.pdf (PDF).(Good all round reference on Hamiltonian monte carlo)

On Wed, Mar 29, 2017 at 7:39 AM, Asif Zubair notifications@github.com wrote:

Hi @twiecki https://github.com/twiecki . I wrote an abstract for the project and am pasting it below. Could you please have a look at it and let me know if it reads well ? The [?] are references that I will add later.

Also, if you, @jsalvatier https://github.com/jsalvatier or @fonnesbeck https://github.com/fonnesbeck have any recommendations for the implementation details that I could include in my proposal, I would really appreciate it.

Thank you. Abstract

Bayesian approach offers an intuitive procedure for doing inference for statistical models. PyMC3 [?] provides a user-friendly framework for probabilistic programing (PP) in which Bayesian inference can be conducted. Central to this approach is the Bayes rule [?] which allows one to compute parameter posteriors for statistical models. However, closed form solutions exist only for the simplest of models and most real world examples require a sampling approach. The earliest approach for sampling, the Metropolis-Hastings algorithm [?], originates from physics-based models and has been judged one of the top ten algorithms of the 20th century [?].

Despite its initial success, the Metropolois-Hastings algorithm showed prohibitive convergence times, especially as the complexity of the models being implemented has increased. Innovations were again drawn from the realm of physics with the Hamiltonian Monte Carlo [?] (HMC) sampler. Briefly, the HMC sampler applies a deterministic proposal Metropolis method to the augmented target. In essence, this amounts to solving the equations of Hamiltonian dynamics which require the computation of a gradient of the potential energy term. It is by taking advantage of this gradient, that the convergence properties can be considerably improved. The No U-Turn Sampler (NUTS) is an improvement over the HMC with self-tuning properties. Both HMC and NUTS approaches are implemented in PyMC3, which along with Stan [?] and R's LaplacesDemon package are the only PP packages to offer these samplers.

However, the advantage of using the gradient information of the likelihood has also been diminished by the large volume of data being analysed. This is because the gradient now needs to be computed over a large space leading to expensive computations. Given the success of stochastic gradient approaches [?] in applications of big data analysis, it was considered that a noisy estimate of the gradient could be used instead of computing the exact gradient. This would alleviate the computational costs while maintaining favorable convergence properties. However, Chen et al. [?] showed that a naive implementation of a stochastic gradient would lead to loss of invariance property of HMC and thus erroneous conclusions. Instead, they suggest that one could apply a friction term to the Hamiltonian dynamics along with a stochastic gradient to recover the invariance property and thus obtain samples from the true target density. It is this sampling approach that we propose to implement as part of the google summer of code project.

The SGHMC sampler will greatly enable pyMC3 to be scalable and also the application of bayesian models to machine learning problems. Indeed, the authors of the original paper [?] showed how such an approach would be advantageous in probabilistic matrix factorization problems. Similar approaches have also been used in Bayesian optimization [?] for the tuning of deep neural networks [?]. We strongly feel that in light of the advantages that SGHMC presents, its addition to the pyMC3 repertoire will provide a huge advantage to the science and technology community.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pymc-devs/pymc3/issues/1899#issuecomment-289997579, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8DiJ7H3cKPcJRUtZzi9LzsHi7V4IP5ks5rqfy8gaJpZM4MdPFK .

-- Peadar Coyle Skype: springcoilarch www.twitter.com/springcoil peadarcoyle.wordpress.com

asifzubair commented 7 years ago

Oh god, this is wonderful! Thank you so much for sharing, @springcoil . I will re-edit and work in the details into my proposal.

asifzubair commented 7 years ago

Thank you, folks. I submitted the proposal and let's see what happens.

I'll close this issue and will look to #1958 which already is an open issue on stochastic gradient approaches. In addition, I wanted to contribute to the emcee PR #1689 and also open a PR for ODE solvers - as it was something I am interested in.