Comments on mixture models notebook

Text above In[40]:
- Using $B$ as a number makes me feel funny inside.
In[40]:
- To simplify and not start with a wall of code, you could put the data generation code into a module in the same directory, and just import it into the notebook. You can then pass in the number of desired data points, e.g., xis, yis, sigmais = simulate_data(N=300). Then you can just visualize the simulated data. I just worry people will get hung up on the details of the simulated data generation, so it might be better to hide from the notebook but provide the script for anyone who wants to see more.

In[41]:

To improve visibility you could change the style of the errorbar plot:

plt.errorbar(yis/sigmais, xis-yis, sigmais, 
           fmt="o", lw=1, markersize=2, ecolor='#888888', alpha=0.75)

Text above In[42]:
- You say "But, as we will see below, because the noise is known, it can be deconvolved, and we can recover the two components pretty well." You might put a caveat that this is only really true if you have some beliefs about the form of the mixture component, right? In this case, we have reason to believe the mixture components are Gaussians?
- Change "In other words, we are in presence of a product of Gaussians being integrated over." to, e.g., "In other words, the expression for the posterior probability contains an integral over a product of two Gaussians."
- "...we have once and for all removed the latent variables and integrals..." It might be worth adding to the discussion that sometimes we are interested in having posterior pdfs over the x_i's too! Of course, you can always back those out with posterior samples over the population parameters - either discuss that further down or link to DFM's blog post about related things.
In[42]:
- You could add docstrings to each function to explain in words what each function does.
- You could rename gaussiansumsig gaussian_sum_var to make the underscores consistent with other functions, and because you sum the variances not the sigmas. But now I'm getting really annoying...
- You could also just define the ln... functions, and in the non-log functions just call exp(ln...()). Reduces some duplicate code, but fine as is.
Text above In[43]:
- "...which, for our fake data set, has 6 parameters" remind the reader what the 6 parameters are.
- Make references to emcee links to the emcee docs?
In[43]:
- I think it helps to use more verbose variable names. Instead of nt, how about n_mixture? Then you'd want to break this into 3 lines:
```
alphas = params[0:n_mixture]
betas = params[n_mixture:2*n_mixture]
gammas = params[2*n_mixture:3*n_mixture]
```
Text above In[46]:
- Change "...live on a thin slice in $\alpha$ space." to "lives on a thin slice in $\alpha$ space."
- "Among the many ways one could solve this problem this, that introduced by M. Betancourt (arxiv:1010.3436) is convenient" - grammar? Also, you could link to the paper.
- "It establishes a bijective mapping between the $(B-1)-$ hypercube and the $B-$simplex," Sounds very fancy, but many people will not learn anything from this statement. If you want to keep it in here for the experts, at least put another sentence afterwards that explains in plain english what's going on - it's a trivial transformation to make in practice, so it shouldn't be obfuscated by math-speak :)
- Change " simple solution is to tune the prior" to "a simple solution is to tune the prior"

In[47]:

You can probably get rid of those "fingers of God" that appear in the corner plot if you do an intermediate sampling where, after burn-in, you generate a small ball around the median sample and burn in again. (not tested), but for example:

sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, threads=4)
pos, prob, state = sampler.run_mcmc(p0s, 1000)
sampler.reset()
new_pos = emcee.utils.sample_ball(np.median(pos, axis=0), 
                                np.full(pos.shape[1], 1E-5), 
                                size=nwalkers)
pos,_,_ = sampler.run_mcmc(new_pos, 1000)
sampler.reset()
pos,_,_ = sampler.run_mcmc(pos, 1000)

Text above In[75]:
- "This is before those have a typical acceptance rate of..." typo?
Text above In[78]:
- Change "However, by using analytic derivatives and Hessian" to "However, by using analytic derivatives and Hessians"
- Change "so that the random moves and made in the most efficient directions" to "so that the random moves are made in the most efficient directions"
- Change "...our sampler will throw a warning every time it rejects a sampler..." to " our sampler will throw a warning every time it rejects a sample"
"Final comments" cell:
- Equation is missing a closing parenthesis in $p({y_i,\sigma_i})$

ixkael / Prob-tools

Comments on mixture models notebook #3