Closed cmlamman closed 4 years ago
From model.py, it looks like you're using the censored data likelihood rather than the truncated data likelihood. Try using TruncatedNormal for your likelihood. See my comment on #26.
Also see the Stan user manual p. 180-184 for a discussion of truncated vs censored data. On p 183 they show why the censored likelihood function has the form you're using -- I think that form doesn't apply for truncated data, but I could be wrong.
I don't think I completely understand what you're saying, are you suggesting I try something like this:
# likelihood before truncating
sep_observed = pm.Normal('sep_observed', mu=sep_angular, sigma=asep_err, observed=asep)
# Jeffreys prior for number of unobserved data
n_seps_trunc_a = pm.Uniform('n_seps_trunc_a', lower=1, upper=4)
n_seps_trunc = pm.Potential('n_seps_trunc', -tt.log(n_seps_trunc_a))
# integrating missing points out from likelihood
right_truncated_seps = n_seps_trunc * pm.TruncatedNormal(
'seps_trunc_right', mu=sep_observed, sigma=asep_err, upper=sep_ang_max)
truncated_seps = pm.Potential('seps_truncated', right_truncated_seps)
This works with separations (still a bad fit) but not with contrasts. It doesn't look like I can use a variable upper bound for the truncated likelihood like I need to for contrasts.
It should just be
right_truncated_seps = pm.TruncatedNormal('seps_trunc_right',
mu=sep_observed, sigma=asep_err, upper=sep_ang_max)
and you don't need the pm.Potential
function afterward. Think of it like this: your likelihood function is the probability of the data (observed separations). There are some observed separations for which that probability is zero. That constraint lops off a part of your normal distribution, and so you end up with a truncated Normal distribution as your likelihood. You don't need to multiply by n_seps_trunc
or add pm.Potential
-- that's only for censored data.
I'd recommend checking if that works on simulated data where you've truncated the separation distance but not the contrast ratios. If it doesn't, it means there's a different problem with the model. If it does, then we can figure out how to implement the variable upper bound for the contrasts.
Alright, that makes more conceptual sense.
This works (just truncating the separations) when I run it on the test data I've been using. When running it on test data where contrast ratios haven't been truncated, the fit is still bad.
So there must be something wrong with the model, or I may be misunderstanding parameters in a distribution somewhere - I'll go through and make sure that the transformations I'm using in the notebook are equivalent to the ones used to make the data.
OK - if you want to post notebooks to a separate branch, I can take a look.
The main difference I found between the hierarchical_model_steps.ipynb we had before and our generative model is how observational limitations are incorporated, and I think this explains why our model fits are not good. In the generative model, the observational limitations were included by sample observed data points from a truncated normal distribution. In the hierarchical_model_steps.ipynb, however, we were removing unobservable points from our final sample. I think we need to find a way to exclude unobservable points in our generative model, instead of just sampling from a truncated distribution.
Sampling the observed points from a truncated distribution forces points with physical parameters (ie mass ratio and physical separation) which would deem them un-observable to appear in the final catalog (their observed contrast ratios and angular separations are forced into the observable range). What we want to do is remove those points from the sample all together. This makes me think we want to incorporate some sort of mixture model, but I'm not sure what that would look like.
You can see how I generated two test data sets, one by sampling from a truncated normal and one by cutting out unobservable points, in trucated_vs_cut_data.ipynb on the trunc_v_cut_data branch. I also compare it to our initial distribution vs model fits, and I think this accounts for the difference.
What do you think? @vnmanoharan @barkls
I don't completely understand what's happening in the truncation model, but it looks like it's only adjusting the measured value, not making any restrictions to the underlying value. Basically, it's forcing all data points into the observational range, even if that means recording an observation several sigma away from the true value.
The correct approach is to generate a full dataset and then remove the values at the end that would not be observed, which is what you are doing in the cutting approach. One difficulty here is that you end up with a different number of data points each time. You can use a while loop to keep resampling the points that are outside the observable range until you have enough.
All of the above refers to how you create a synthetic dataset. In the inference model, you already know that all of the data points were observed, so you don't need an extra step to cut out the unobserved data points.
Hopefully this helps clarify things!
Solomon, I think part of the reason Victoria's plots are interesting is that when using the truncated normal distribution, it shifts the distribution in a way similar to the results we're getting for our fit, so this could be a hint that the issue we have is in how we're truncating the samples in our model?
I've also added a new branch, model_testing
which includes my most recent attempts to get the model working in hrarcl_pymc3_model.ipynb
So what @barkls is saying is that the way you're generating the synthetic data is different from true truncation, even though you're using a "truncated" normal distribution. Cutting (rather than generating from a truncated normal), as you were doing before, is the correct way to go for generating the synthetic data.
For the generative model (a confusing term, since it's not equivalent to how you generate the synthetic data), you don't cut out any observations. The way you handle the cutoff is to set the likelihood function to zero outside the observational space. This shouldn't remove any actual observations -- if it does, it means the cutoff for the likelihood function is inconsistent with the way the data were obtained.
Anyway, the truncated normal likelihood is a way of handling this cutoff. The main difference from setting a pm.Bound is that the truncated normal is properly normalized (see the first few paragraphs of this issue).
@cmlamman, I am confused as to why you have
sep_observed = pm.Normal('sep_observed', mu=sep_angular, sigma=asep_err, observed=asep)
followed by
truncated_seps = pm.TruncatedNormal('seps_trunc', mu=sep_observed, sigma=asep_err, upper=sep_ang_max)
in the latest notebook. This seems to add an extra level of hierarchy to the model. I think you can delete the second line and change the first to the following:
sep_observed = pm.TruncatedNormal('sep_observed', mu=sep_angular, sigma=asep_err, observed=asep, upper=sep_ang_max)
The idea is that you get the observed separations by adding noise to and truncating the true angular separations. The truncated normal likelihood accounts for both of those effects simultaneously.
Looking back at the thread, I realize the confusion was my fault -- earlier in the thread, I wrote the wrong line of code. Sorry about that. Try the suggestion in the previous post and see if it works.
@vnmanoharan this returns Mass matrix contains zeros on the diagonal
(see updated hrarcl_pymc3_model.ipynb
in the model_testing branch. Perhaps there's still something wrong with the priors? I've really tried to make sure that there are no more hard cutoffs.
And this method also doesn't accept a variable for the limit, so I'm not sure how we could use it for contrast ratios. Also, if we're not removing any samples but instead truncating the likelihood, I think we'd also have to use a variable cutoff for the separations. The 'contrast curve' limit we're working with is dependent on separations - ie systems are unobservable if they have both a small separation AND high contrast. So we might need to have variable cutoffs for both? The contrast limit is would be dependent on separations and the separation limit would be dependent on contrast
Try the model below. It should be equivalent to a truncated normal likelihood, if I've done the math correctly. I tested it and it samples successfully, but the fit may still be off. Anyway, we should be able to use pm.Potential to also implement a variable cutoff for the contrast curves.
with pm.Model() as hierarchical_model:
center = pm.Gamma('center', mu=40, sigma=10)
width_diff = pm.Beta('width_difference', alpha=2, beta=2)
width = pm.Deterministic('width', center-(center*width_diff))
# Gaussian model for separations (in AU)
# Use 'shape' parameter so that the physical seps are sampled and you can plot the distribution
sep_physical = pm.Gamma('sep_physical', mu=center, sigma=width, shape=len(asep))
# physical separations to angular separations
sep_angular = pm.Deterministic('sep_angular', sep_physical * parallax)
# likelihood of separations is normal
sep_observed = pm.Normal('sep_observed', mu=sep_angular, sigma=asep_err, observed=asep)
# pm.Potential accounts for truncation and normalizes likelihood
truncated_seps = pm.Potential('upper_truncated_seps', normal_lccdf(sep_observed, asep_err, sep_ang_max))
traces = pm.sample(tune=nsteps, draws=nsteps, step=None, chains=1)
df = pm.trace_to_dataframe(traces)
return df
A nice feature of this model is that you can actually plot the inferred (latent) physical separations after you sample.
sns.distplot(samples[samples.columns[3:1130]].quantile(0.5))
Thank you Vinny, this help a lot!! After making some adjustments to the priors I was able to get a much better fit for the separations and a GREAT fit for the mass ratios (pic attached).
I think we were able to address the main issue this thread was made to address, but I'm still not happy with the separation fit and think that a gamma distribution may not be right for this based on literature. So I'll close this issue and open another one (let me know if this is not the proper github way to do this)
with newest simulated data:
with old simulated data (didn't sample from gaussian to account for errors):
I think the issue with the fit quality for the separations is not the distribution, but what you pointed out earlier: the contrast ratio cutoff also affects the observed separations. Your current model doesn't account for this. We should work out the likelihood function for this case.
@vnmanoharan I've tried with a new line to account for this,
truncated_seps_lower = pm.Potential('lower_truncated_seps', normal_lcdf(sep_observed, asep_err, sep_min(cr_observed_inverse)))
where sep_min() is just the inverse function used to limit the contrast ratios as a function of separations. It doesn't improve the fit
Also, I want to additionally fit to a different distribution (lognormal) mainly because it appears to be consistent with the literature
When fitting the model with simulated data, the resulting parameters don't match the distributions used to simulate the data.
See the last cell in the tutorial notebook.
The model includes several transformations to different distributions, and I'm wondering if the fit just looks bad because I don't understand the parameters for one of these distributions correctly.