Conditional PDF for x_1 and c

drphilmarshall commented 9 years ago

x_1 and c are SN parameters that, in the ensemble analysis, must be assumed to be drawn from some PDF. We can get some idea of how to model that PDF in our hierarchical inference by looking at the scatter plot of all samples from all emcee runs on all real supernovae. This distribution of points will be broader than the PDF for the 'true" x_1 and c values, but it might show us whether we need a bivariate function instead of two univariate ones (ie, we might see some correlation between x_1 and c). We can also plot the posterior means from each emcee run, but this will just make the plot less smooth.

Note that in the PGM below I made the simplest possible assignment - single Gaussians all round! But then I started wondering about correlations.

Phil's new PGM

wmwv commented 9 years ago

x0, x1, and c are all SN parameters that should be drawn from population distributions.

Michael

On Mar 20, 2015, at 07:43, Phil Marshall notifications@github.com wrote:

x_1 and c are SN parameters that, in the ensemble analysis, must be assumed to be drawn from some PDF. We can get some idea of how to model that PDF in our hierarchical inference by looking at the scatter plot of all samples from all emcee runs on all real supernovae. This distribution of points will be broader than the PDF for the 'true" x_1 and c values, but it might show us whether we need a bivariate function instead of two univariate ones (ie, we might see some correlation between x_1 and c). We can also plot the posterior means from each emcee run, but this will just make the plot less smooth.

Note that in the PGM below I made the simplest possible assignment - single Gaussians all round! But then I started wondering about correlations.

— Reply to this email directly or view it on GitHub.

drphilmarshall commented 9 years ago

Agreed - but my understanding is that x_0 is going to get replaced by some combination of M and mu, so I'm saving the population modeling for M. What do you know about observed correlations between independently fitted x_1 and c pairs in samples of real supernovae? Are they correlated?

On Fri, Mar 20, 2015 at 8:21 AM, wmwv notifications@github.com wrote:

x0, x1, and c are all SN parameters that should be drawn from population distributions.

Michael

On Mar 20, 2015, at 07:43, Phil Marshall notifications@github.com wrote:

x_1 and c are SN parameters that, in the ensemble analysis, must be assumed to be drawn from some PDF. We can get some idea of how to model that PDF in our hierarchical inference by looking at the scatter plot of all samples from all emcee runs on all real supernovae. This distribution of points will be broader than the PDF for the 'true" x_1 and c values, but it might show us whether we need a bivariate function instead of two univariate ones (ie, we might see some correlation between x_1 and c). We can also plot the posterior means from each emcee run, but this will just make the plot less smooth.

Note that in the PGM below I made the simplest possible assignment - single Gaussians all round! But then I started wondering about correlations.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/kbarbary/snpgm/issues/12#issuecomment-84048076.

rbiswas4 commented 9 years ago

My understanding was that those distributions in the PGM are priors. So, should we not be OK getting away with approximate distributions and it is fine if they don't look like the population distribution?

I did not think x1, and c population distributions are terribly correlated, and I believe that SN simulations currently have uncorrelated population distributions on x1 and c (I will recheck). I don't know how good that is, and would like to find the population distribution from data for simulation purposes as a mixture model, but introducing something like that here would complicate the inference (too many variables).

wmwv commented 9 years ago

By construction x_1 and c are meant to be uncorrelated.

Physically, yes, the intrinsic color depends on x_1.

In SALT2, "c" is redefined to be the color with respect to color(x_1).

Michael

On Mar 20, 2015, at 09:13 , rbiswas4 notifications@github.com wrote:

My understanding was that those distributions in the PGM are priors. So, should we not be OK getting away with approximate distributions and it is fine if they don't look like the population distribution?

I did not think x1, and c population distributions are terribly correlated, and I believe that SN simulations currently have uncorrelated population distributions on x1 and c (I will recheck). I don't know how good that is, and would like to find the population distribution from data for simulation purposes as a mixture model, but introducing something like that here would complicate the inference (too many variables).

— Reply to this email directly or view it on GitHub.

rbiswas4 commented 9 years ago

@wmwv,

I think you have gone into an area that I don't know about. When you have time, would you mind explaining those statements a little more or adding references? Thanks.

wmwv commented 9 years ago

Guy+2007 "SALT2: using distant supernovae to improve the use of Type Ia supernovae as distance indicators" http://adsabs.harvard.edu/abs/2007A&A...466...11G Section 2 """ As for SALT, the optical depth is expressed using a color offset with respect to the average at the date maximum luminosity in B-band, c = (B−V)_MAX − . This parametrization models the part of the color variation that is independent of phase, whereas the remaining color variation with phase is accounted for by the linear components. """

("MAX" in the above means "at time of B-band maximum light")

Michael

On Mar 20, 2015, at 09:35 , rbiswas4 notifications@github.com wrote:

@wmwv,

I think you have gone into an area that I don't know about. When you have time, would you mind explaining those statements a little more or adding references? Thanks.

— Reply to this email directly or view it on GitHub.

rbiswas4 commented 9 years ago

@wmwv

OK, I see what you meant by color(x_1) and now understand the second two parts of the statement. But this does could still allow x_1 and c to be correlated, right?

wmwv commented 9 years ago

Not if your data set looks like the set used to train SALT2.

E.g., Betoule14 JLA sample

http://adsabs.harvard.edu/abs/2014A%26A...568A..22B

retrained SALT2 on the JLA sample. You can take a look at it

http://supernovae.in2p3.fr/sdss_snls_jla/ReadMe.html

Copy-and-paste:

curl -O http://supernovae.in2p3.fr/sdss_snls_jla/jla_likelihood_v6.tgz tar xvzf jla_likelihood_v6.tgz

python from astropy.io import ascii import matplotlib.pyplot as plt file='jla_likelihood_v6/data/jla_lcparams.txt' jla=ascii.read(file) plt.scatter(jla['x1'],jla['color']) plt.xlabel('x1') plt.ylabel('color') plt.title('JLA Betoule14') plt.savefig('JLA_Betoule14_x1_color.pdf') plt.show()

and you'll get that attached plot which shows that x1 and c are uncorrelated in the JLA sample.

If your sample is different, then it's possible that there may be some correlation, but we can definitely ignore any correlation between x1 and c for now.

Michael

On Mar 20, 2015, at 10:35 , rbiswas4 notifications@github.com wrote:

@wmwv

OK, I see what you meant by color(x_1) and now understand the second two parts of the statement. But this does could still allow x_1 and c to be correlated, right?

— Reply to this email directly or view it on GitHub.

jla_betoule14_x1_color

drphilmarshall commented 9 years ago

Lovely - a pair of independent Gaussians it is then!

On Fri, Mar 20, 2015 at 11:26 AM, wmwv notifications@github.com wrote:

Not if your data set looks like the set used to train SALT2.

E.g., Betoule14 JLA sample

http://adsabs.harvard.edu/abs/2014A%26A...568A..22B

retrained SALT2 on the JLA sample. You can take a look at it

http://supernovae.in2p3.fr/sdss_snls_jla/ReadMe.html

Copy-and-paste:

curl -O http://supernovae.in2p3.fr/sdss_snls_jla/jla_likelihood_v6.tgz tar xvzf jla_likelihood_v6.tgz

python from astropy.io import ascii import matplotlib.pyplot as plt file='jla_likelihood_v6/data/jla_lcparams.txt' jla=ascii.read(file) plt.scatter(jla['x1'],jla['color']) plt.xlabel('x1') plt.ylabel('color') plt.title('JLA Betoule14') plt.savefig('JLA_Betoule14_x1_color.pdf') plt.show()

and you'll get that attached plot which shows that x1 and c are uncorrelated in the JLA sample.

If your sample is different, then it's possible that there may be some correlation, but we can definitely ignore any correlation between x1 and c for now.

Michael

On Mar 20, 2015, at 10:35 , rbiswas4 notifications@github.com wrote:

@wmwv

OK, I see what you meant by color(x_1) and now understand the second two parts of the statement. But this does could still allow x_1 and c to be correlated, right?

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/kbarbary/snpgm/issues/12#issuecomment-84094508.

rbiswas4 / snpgm

Conditional PDF for x_1 and c #12