privong / pymccorrelation

Correlation coefficients with uncertainties
GNU General Public License v3.0
10 stars 3 forks source link

More flexible error distributions #17

Closed Parrazyte closed 1 year ago

Parrazyte commented 1 year ago

As of now, the library doesn't allow non-symmetric uncertainties, or values at a boundary (in which the uncertainty is one sided). These cases are extremely common for e.g. astrophysics fitting results. I've done it myself locally and it's a very minor change. I think a proper implementation up to the standards of such an important library would be very beneficiary to the community.

privong commented 1 year ago

Thanks for the request. Can you please provide some more information on what you'd specifically like to see implemented? As you've likely seen, the code currently assumes Gaussian uncertainties so that it can draw new values. Implementing the changes your request would require generalizing that code, but making it truly general is probably not trivial. I could use more information on what you're looking for.

Parrazyte commented 1 year ago

For your first point, I was referring to the second case of non-symmetric posterior distributions. I agree that parsing the whole distribution is better, but in e.g. my case, computing the posterior distribution is computationally expensive and adding this input would require significant changes in your code. I was then proposing to add the "ad-hoc" choice of splitting the distribution on each side (e.g. providing 2 values for the sigma which would then be passed for different widths of each half-gaussian).

For the values at a boundary, yes, indeed, this amounts to upper limits. Direct example since you're also doing some x-ray astronomy : Fitting results with xspec may peg parameters to their respective max/min values, in which case the uncertainties and posterior distribution from these parameters will only have a tail on one side (of course assuming that this tail is gaussian is very crude).

privong commented 1 year ago

Thanks for the quick reply. Regarding non-symmetric distributions, both implementing the draw from the posteriors and hacking together a split distribution are doable (and would require additional bookkeeping). I'd like to see or do some tests to see how robustly the latter approach recovers the "right" answer, comparing to the full/general distributions before implementing something like this though.

For the boundary values, as you describe them, I think that the correct way to handle them is to flag them as upper or lower limits. Perturbing them around in the one-side of the tail would not fully capture the fact that the model fitting could not constraint them being beyond the boundary (at least in the case you noted). If the boundary is a physical one (e.g., causality constraints, as in some pulsar system timing), then treating them as fully constrained values makes sense. But then it'd be preferable to use the full posterior distribution as above.

privong commented 1 year ago

As a follow-up, if it's too computationally expensive to get the full posterior distributions, how are you estimating the asymmetric uncertainties that you'd like to feed into pymccorrelation? Implementing an ad hoc method would probably benefit from trying to "match" the method of estimating the uncertainties when doing the point perturbation.

Parrazyte commented 1 year ago

Thanks for your answer. Your make a good point for the need for "physicality" for upper limits. Currently I estimate the asymmetry from errors computations at a given sigma on these parameters.

privong commented 1 year ago

Currently I estimate the asymmetry from errors computations at a given sigma on these parameters.

Can you say a bit more about this? How are you deriving/obtaining the sigma values?

privong commented 1 year ago

@Parrazyte Pinging you about my recent question. Can you please explain how you estimate the asymmetric uncertainties and provide a minimum standalone example? This would help me see how to implement such a capability into this library.

privong commented 1 year ago

@Parrazyte At this point I don't have enough information to address your requested changes. Given it's been a few months without more info, I'm going to close the ticket. If you'd like to provide more information, please feel free to repoen it.