phys201 / multstars

A package for parameter estimation of stellar multiplicity properties
GNU General Public License v3.0
0 stars 0 forks source link

Truncated likelihood may not be correct #41

Open vnmanoharan opened 4 years ago

vnmanoharan commented 4 years ago

I think the correct way to truncate the likelihood would be to do it in one step, using

truncated_seps = pm.Potential('truncated_seps', -pm.math.logdiffexp(normal_lcdf(sep_angular, asep_err, sep_ang_max), normal_lcdf(sep_angular, asep_err, sep_min(cr_inverse))))

This truncates the likelihood on both sides. The logdiffexp function is a more numerically stable way of working with the lcdf. I translated this formulation from the Stan manual, "Truncation with lower and upper bounds in Stan". That section of the manual also explains why the likelihood has this form.

If you change from cr_inverse as your observations to cr (see #39) you'll have to replace the sep_min function above with one that operates on the contrast ratios.

cmlamman commented 4 years ago

This method appears to make the sampler run out of bounds, and when I can get it to run the fit is bad: model_fit_v

I thought maybe it should be sep_observed and cr_observed instead of sep_angular and contrast_ratios, but that returns 'bad initial energy'

cmlamman commented 4 years ago

I think I understand the math form of the likelihood function now (the Stan Manual and your notes were very helpful). The remaining parts I don't understand (and where maybe something is going wrong) are:

  1. pm.Potential() - I don't understand exactly what this does and am having trouble finding a mathematical explanation, maybe it's being used incorrectly here

  2. I'm not sure that using normal_lcdf is equivalent to taking the log of the lcdf. Since it's normalized, perhaps it does need to be multiplied by a parameter, N, which described the number of samples that fall outside the range? Vinny, you said this is mathematically equivalent to the March paper. They use the parameter N for truncated data, which indicates to me that maybe it is important to include in our model.

(I've tried just scaling each normal_lcdf funciton by it's own parameter N but cannot get the model to run)

vnmanoharan commented 4 years ago

Good question. The factor of N should be included implicitly because sep_angular is a vector. pm.Potential is not well documented but it adds a term to the log posterior.

cmlamman commented 4 years ago

To see a simplified notebook looking at this issue, see troubleshooting_model.ipynb in the model_testing branch