pymc3 and stan comparison for mixture estimation

pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python

https://docs.pymc.io/

Other

8.72k stars 2.01k forks source link

pymc3 and stan comparison for mixture estimation #1376

Closed krz closed 8 years ago

krz commented 8 years ago

Hi,

I built a model in Stan for estimating the parameters of a mixture distribution of two Negative Binomials. I generated data with mu1=1 and mu2=8 and the Stan estimates are mu1=1 and mu2=9 which is fine. However, when I built the model in pymc3 using the same data I do not get even close to the true parameters, even if I specify the true parameters as starting values. Any idea what is going on here? https://gist.github.com/krz/0bc58592a5b088076c4ecb8141ee34bd

twiecki commented 8 years ago

Try not using NUTS (use Metropolis instead).

twiecki commented 8 years ago

Also, do Normal(..., sd=10), by default, pymc3 uses precision as the second parameter.

krz commented 8 years ago

thanks, I always forget this when switching between Stan and pymc3. I changed the sd and switched to Metropolis sampler, provided the true starting values, but it gets worse:

https://gist.github.com/krz/c47da356a7352d9f2218ffd5c096af69

twiecki commented 8 years ago

What's up with the scaling kwarg?

krz commented 8 years ago

the idea with the scaling is from this post, where it was found to be the key to convergence for mixtures: http://blog.booleanbiotech.com/static/Mixture%20of%20Gaussians%20in%20PyMC3.html

I removed the scaling and also tried the Slice sampler, no difference: https://gist.github.com/krz/2549d51650649b785b420bdecc73c1dd

I am not sure if the stan and pymc3 parametrization of the Negative Binomial is the same (but I think so), can you verify that: http://i.imgur.com/EhEy5io.png

fonnesbeck commented 8 years ago

A couple of issues with the implementation of the model. First, setting a mean-zero prior is probably not a good idea, since it is the mean to a negative binomial, which models positive counts. In fact, using a normal is not very efficient, since half of the prior probability is not valid support for the likelihood. Second, its usually best to let PyMC select the step methods for you, unless you have a good reason to do otherwise.

I managed to get a decent result with this revision of the model.

I used a Lognormal distribution for mu to constrain it positive. Stan allows you to use a normal because you can apply a lower constraint when you declare the variable, which essentially makes it a half normal. You could use a HalfNormal explicitly here too.

A better approach still would be to marginalize out the bernoulli indicators, and use a mixture likelihood. This would let you use gradient-based methods throughout the model.

krz commented 8 years ago

Thanks, this is a result I can live with. I did not expect the prior to be that important, because I have lots of data and I even did provide true starting values. In the Stan model I did not restrict the Normal for the mu prior as well. From your output it seems like pymc selected the same step methods as I specified manually. Actually I am very impressed how good Stan can estimate this model, maybe because they have a dedicated mixture function.

twiecki commented 8 years ago

Yeah, would be great to implement that in pymc3.

krz commented 8 years ago

just a quick note, I have problem reproducing this result: https://gist.github.com/krz/56394b2c5a7dc89c1833b8af848f1798

If I only use 1 job I get no result. When using 4 jobs it gets better. When using more iterations it gets wors, the chains have a downward trend.

krz commented 8 years ago

Another think that I realized in your notebook and in mine as well: The grid lines in the trace density plots are gone, is this an intended change and how can I get them back?