To simplify and not start with a wall of code, you could put the data
generation code into a module in the same directory, and just import it
into the notebook. You can then pass in the number of desired data points,
e.g., xis, yis, sigmais = simulate_data(N=300). Then you can just
visualize the simulated data. I just worry people will get hung up on the
details of the simulated data generation, so it might be better to hide
from the notebook but provide the script for anyone who wants to see more.
In[41]:
To improve visibility you could change the style of the errorbar plot:
You say "But, as we will see below, because the noise is known, it can be
deconvolved, and we can recover the two components pretty well." You might
put a caveat that this is only really true if you have some beliefs about
the form of the mixture component, right? In this case, we have reason to
believe the mixture components are Gaussians?
Change "In other words, we are in presence of a product of Gaussians being
integrated over." to, e.g., "In other words, the expression for the
posterior probability contains an integral over a product of two
Gaussians."
"...we have once and for all removed the latent variables and
integrals..." It might be worth adding to the discussion that sometimes we
are interested in having posterior pdfs over the x_i's too! Of course, you
can always back those out with posterior samples over the population
parameters - either discuss that further down or link to DFM's blog post
about related things.
In[42]:
You could add docstrings to each function to explain in words what each
function does.
You could rename gaussiansumsiggaussian_sum_var to make the
underscores consistent with other functions, and because you sum the
variances not the sigmas. But now I'm getting really annoying...
You could also just define the ln... functions, and in the non-log
functions just call exp(ln...()). Reduces some duplicate code, but fine
as is.
Text above In[43]:
"...which, for our fake data set, has 6 parameters" remind the reader what
the 6 parameters are.
Make references to emcee links to the emcee docs?
In[43]:
I think it helps to use more verbose variable names. Instead of nt, how
about n_mixture? Then you'd want to break this into 3 lines:
Change "...live on a thin slice in $\alpha$ space." to "lives on a thin
slice in $\alpha$ space."
"Among the many ways one could solve this problem this, that introduced by
M. Betancourt (arxiv:1010.3436) is convenient" - grammar? Also, you could
link to the paper.
"It establishes a bijective mapping between the $(B-1)-$ hypercube and
the $B-$simplex," Sounds very fancy, but many people will not learn
anything from this statement. If you want to keep it in here for the
experts, at least put another sentence afterwards that explains in plain
english what's going on - it's a trivial transformation to make in
practice, so it shouldn't be obfuscated by math-speak :)
Change " simple solution is to tune the prior" to "a simple solution is to
tune the prior"
In[47]:
You can probably get rid of those "fingers of God" that appear in the
corner plot if you do an intermediate sampling where, after burn-in, you
generate a small ball around the median sample and burn in again. (not
tested), but for example:
xis, yis, sigmais = simulate_data(N=300)
. Then you can just visualize the simulated data. I just worry people will get hung up on the details of the simulated data generation, so it might be better to hide from the notebook but provide the script for anyone who wants to see more.gaussiansumsig
gaussian_sum_var
to make the underscores consistent with other functions, and because you sum the variances not the sigmas. But now I'm getting really annoying...ln...
functions, and in the non-log functions just callexp(ln...())
. Reduces some duplicate code, but fine as is.emcee
links to the emcee docs?nt
, how aboutn_mixture
? Then you'd want to break this into 3 lines: