Math for Initial PGM - Githubissues

kponder commented 8 years ago

The math for the most basic PGM that we are starting with. @kponder will post this PGM and @rbiswas4 will write down the integrals.

rbiswas4 commented 8 years ago

Actually, to stave off some of the issues people had with compiling my latex setup last time, is it a good idea to have a doc section with @drphilmarshall's famous setup files?

kponder commented 8 years ago

@rbiswas4 , @kbarbary , @rubind et al,

Please take a look at this PGM. This is what I understood from our conversation on Friday for our most basic PGM. Let me know if anything is wrong. I also posted the code for making this in Python with Daft in the Examples directory.

Note: data is the observed { m_B, x_1, c } per supernova.

snpgm_basic

drphilmarshall commented 8 years ago

Nice! Is the idea that $c^{\rm dist}$, $x^{\rm dist}1$, and $\sigma^{\rm dist}{\rm int}$ represent the means and variances of independent univariate Gaussians, at first? Is there support for that assumption from the distribution of point-estimated $ci$, $x{1,i}$ and $m_{B,i}$ values, from various samples run through SALT2 with uninformative priors (as well as the design of the parametrization in the first place)?

On Mon, Jan 18, 2016 at 1:27 PM, Kara Ponder notifications@github.com wrote:

@rbiswas4 https://github.com/rbiswas4 , @kbarbary https://github.com/kbarbary , @rubind https://github.com/rubind et al,

Please take a look at this PGM. This is what I understood from our conversation on Friday for our most basic PGM. Let me know if anything is wrong. I also posted the code for making this in Python with Daft in the Examples directory.

Note: data is the observed { m_B, x_1, c } per supernova.

[image: snpgm_basic] https://cloud.githubusercontent.com/assets/9933624/12403106/92ab612a-bdff-11e5-868e-9e3f36fb8df0.png

— Reply to this email directly or view it on GitHub https://github.com/kponder/PyUNITY/issues/2#issuecomment-172658593.

rbiswas4 commented 8 years ago

@kponder Great! I think this is very close to what I recall us all as agreeing upon.
There are a couple of things I thought were different:

the selection cut parameters, depending on mB, x1, c values.
Should the 'data' also depend directly on $x1{i}^{\rm{true}}$ and $c{i]^{\rm{true]}$ values rather than just through $m_B^{\rm{true}}$ ?

Also, during our discussion and blind drawings at Yalis @kponder and @kbarbary suggested having the $\sigma^{\rm{dist}}_{int} outside the outer plate. Keeping this inside the samples (as in this diagram) allows some prior beliefs on the goodness of samples or applicability of the SALT2 model. But I thought about this a little and was wondering if it makes sense to

either suppress this dependence (This makes sense if we are simulating equivalent samples, as that degree of freedom is gone). I was going to suggest this to start with
Or decouple this weighting of samples from sigma^{int}. and introduce a sample dependent parameter of weights
Or think about making this parameter dependent. The last has the motivation that the training sample is perhaps not as good in all areas.

AlexGKim commented 8 years ago

In case you are interested, here is a simplified version of my pgm implemented in PyMC3.

https://github.com/dessn/abc/blob/master/src/Nodes.py

kponder commented 8 years ago

@rbiswas4 thank you for the comments.

I do not remember this. What are they?
I did forget to put the dependency of the data on $x1{i}^{\rm{true}}$ and $c{i]^{\rm{true]}$.

The most basic PGM would keep $\sigma{int}^{\rm{dist}}$ outside all of the plates. This assumes that all supernova are drawn from the same intrinsic distribution. Would it only go inside the plate once we change it to $sigma{samp}$ or can $\sigma_{int}^{\rm{dist}}$ be per sample?

Here is the updated version:

snpgm_basic1

rbiswas4 commented 8 years ago

I do not remember this. What are they?

I think at Yalis, we wrote these as a circle with the parameters $m_B^{\rm{cut}}, x_1^{\rm{cut}, c^{\rm{cut}, z$. I think there are two things that this encodes in principle:

Survey selection due to dimness
Explicit cuts in x1, c and mB that sndatasets might have because the survey people threw away SN.

In practice, this is hard for many reasons, but perhaps the hardest to model is people behavior in spectroscopic selection, for example represented by @rubind 's happy faces

I did forget to put the dependency of the data on $x1{i}^{\rm{true}}$ and $c{i]^{\rm{true]}$.

and it looks great here.

The most basic PGM would keep $\sigma_{int}^{\rm{dist}}$ outside all of the plates. This assumes that all supernova are drawn from the same intrinsic distribution.

Correct

Would it only go inside the plate once we change it to $sigma{samp}$ or can $\sigma{int}^{\rm{dist}}$ be per sample?

In practice, every different sample (or combination of samples) report different values of $\alpha, \beta, MB, \sigma{int}^{dist}.$ I believe in our philosophy, we should attribute these sample dependent differences to either the selection criteria, or the parameter space (rest frame part of the spectrum) that the sample probes. A third thing is that we may believe that some samples are more accurate in their reports than others (which will not happen for simulated samples).

rbiswas4 commented 8 years ago

Actually a couple of more things maybe:

Should not $x1^{\rm{dist}}$ and $c^{\rm{dist}}$ also be outside the plate like $\sigma^{\rm{int}}_{dist}
It might be a good idea to write out what we mean by data

rubind commented 8 years ago

If selection effects are incorporated, then the x1 and c distributions can be sample-independent.

rubind commented 8 years ago

Did we decide that the data for this initial model are SALT2 fit results?

rbiswas4 commented 8 years ago

@rubind

Agree completely
I thought we agreed on that. And this means both the estimated values of the SALT2 parameters and their covariances, correct?

AlexGKim commented 8 years ago

Is is decide to implement this in pymc3? If so, is anyone working on implementing numerical integration inheriting from theano.Op?

kponder commented 8 years ago

We did agree to do SALT2 parameters and covariances. We discussed starting from light curves after UNITY had been fully implemented.

@AlexGKim I think pymc3 is an option, but we haven't started writing code yet.

How is this one?

snpgm_basic2

rubind commented 8 years ago

@kponder The selection cuts impact the magnitudes as well. I'm really not clear about how to put this into a PGM; after the marginalization over missing SNe, the selection effects are almost like a modified data likelihood. That is how I drew it in UNITY.

rbiswas4 commented 8 years ago

Thanks for the new PGM!

I would think that we should have a second '.' before the 'observed' variables (mB, x1, c) here. The observed counterparts of these variables should depend on these dot variables and the cut parameters to give us the 'observed' variables. Also I am not sure that the cut variables would depend on the dist variables, though obviously one can't go wrong by showing that dependence. However, the $c_i^{\rm{true}} values would not depend on the ${c_j^{\rm{cut}}$ even though they should depend on $c^{\rm{dist}}$.

kponder commented 8 years ago

@rbiswas4 What would the second dot represent?

snpgm_basic4

rbiswas4 commented 8 years ago

@kponder Actually, I like this better than my second dot suggestion which I now agree would be superfluous!

kbarbary commented 8 years ago

Hi all, I'm getting back to thinking about this after a long delay. I'm still a bit uncertain about what the "obs" parameters really mean, since we don't actually observe any such parameters. A month ago, my thought was that a SALT fit gives you P(data | m_B, x1, c, z, t0) (albeit approximated as a multi-variate Gaussian) where data is the observed light curve points and m_B, x1, c, z, t0 are the true parameters. In this way, it seems unnecessary to even talk about an "observed" m_B, x1, etc.

Thinking ahead however, I'm wondering if in @rubind's framework it is necessary to have the explicit "obs" parameters in order to incorporate the intrinsic dispersion ("unexplained variance") covariance matrix. (In the model above, we don't have a full matrix, just sigma_int^dist.) Or, can this just be analytically marginalized out since P(data | m_B, x1, c, z, t0) from SALT is also a Gaussian?

I hope to start writing up a few equations in the next few days, unless someone already has.

rubind commented 8 years ago

UNITY analytically marginalized out, by adding the covariance matrices (obs and int).

rbiswas4 commented 8 years ago

@kbarbary , I am confused a bit here by this point. My understanding was that this was a case where we will not have light curves and we will pretend that the light curve fit parameters and their uncertainties are 'measured quantities' and thus observed. While obviously not correct in principle, this is the second step in going from light curve fit parameters to distances.

On writing down the math. If you are ready, please go ahead.

kponder / PyUNITY

Math for Initial PGM #2