lmfit / uncertainties

Transparent calculations with uncertainties on the quantities involved (aka "error propagation"); calculation of derivatives.
http://uncertainties.readthedocs.io/
Other
579 stars 73 forks source link

Asymmetrical error distributions? #25

Closed SjoerdOptLand closed 10 years ago

SjoerdOptLand commented 10 years ago

How hard would it be to add support for asymmetrical error distributions? E.g.: 3 (-1/+4)

jbwhit commented 10 years ago

I've thought about this issue for a while, and I suspect it would be difficult (probably impossible unless the full error distribution was explicitly given). I say this mostly because the math of this package's error propagation calculations assumes a gaussian distribution. Once an error is non-gaussian (even if symmetric), you have to take it on a case-by-case basis from end to end. There is no general distribution that describes 3 (-1/+4).

lebigot commented 10 years ago

I thought about asymmetrical errors too, some time ago, when Jonathan (jbwhit) asked me the same question. :) I tried to come up with a solution, but I failed, at the time, for the reason that Jonathan mentioned.

Side note 1: The uncertainties package does not assume symmetrical random variables. It only uses two characteristics of the random variable: its "nominal value" (mean, median,…) and its standard deviation. However, the package does not handle a "right" and a "left" standard deviation separately (more on this below).

Side note 2: I want to insist on the fact that the uncertainties package does not assume gaussian distributions (http://pythonhosted.org/uncertainties/tech_guide.html#mathematical-definition-of-numbers-with-uncertainties). In fact, the linear theory of error propagation does not need this restrictive assumption: it just relies on the local expansion of functions.

Back to asymmetrical errors: I did not find any theory of the linear approximation of asymmetrical errors, either in my memory or on the web. I just redid some calculations, and I come again to the same conclusion.

More precisely, for a random variable p, I defined the "right-hand standard deviation" as sigma_R = 2 * integral_0^+inf (x - < x >)^2 p(x) dx (and similarly for the "left-hand standard deviation". For a symmetrical p, this gives the standard deviation. So far so good.

However, finding the right-hand standard deviation of a function f(p, q) after linearly approximating it around (p, q) = (< p >, < q >) gives something nice that generalizes the symmetrical case plus an additional term that depends on how asymmetrical p and q are (with respect to their mean). So, as far as I can see, "3 (-1/+4)" does not contain enough information even for linear error propagation of the right-hand standard deviation.

That said, maybe there is another reasonable way of defining the "right-hand/positive error", that leads to a simple asymmetrical uncertainty for f(p, q). What definition of the errors do you use?

Here is another idea. "3 (-1/+4)" represents your distribution. If you convert this distribution to a nominal value (median, mean,…) and standard deviation, then you can use the uncertainties package. The result is always expressed in the same way (nominal value and standard deviation), but it is correct within linear error propagation theory (and does not imply a symmetrical probability distribution: http://pythonhosted.org/uncertainties/tech_guide.html#mathematical-definition-of-numbers-with-uncertainties).

SjoerdOptLand commented 10 years ago

Thanks, maybe I should reconsider my original problem. Indeed, most physical quantities in a natural unit are normally distributed. Maybe I should treat and store all my data in these units and only apply problematic transforms at the end...

Practical case: I measure 1(+/- 0.2)W going to a passive load, and 0.9(+/- 0.2)W coming back. Naively, the power dissipated by the load must thus be 0.1(+/-0.28)W or between -0.18W and +0.38W. Because it is a passive load, the dissipated power cannot be negative, so I know that it rather is between 0W and +0.38W. In decibels (10*log_10(P)), that gives -7dBW nominally, between -infinity and -4.2 dBW... I know of no distribution that correctly represents the distribution of this power in decibels.

But maybe, for more reasonable assymetry (like my 0.2W between 0 and 0.38W), we could suppose a skew normal distribution (http://en.wikipedia.org/wiki/Skew_normal_distribution). What do you think?

lebigot commented 10 years ago

Can you correlate your measurements? More specifically, when you inject 1.0 W, you should never measure 1.1 W coming back. Concretely, if you can measure the powers to, say at least about 0.01 W, then the correct approach would be to directly sample the dissipated power.

On the other hand, if your uncertainties is mostly due to a limited measurement precision, then indeed your measurements alone indicate that the dissipated power can be negative: in this situation, the instruments are not precise enough to rule out a negative dissipated power (we just have to imagine that you are the first person to conduct experiments with passive loads: your measurements do not show unambiguously that the dissipated power is positive).

That said, the usual way of adding knowledge (like the idea that a dissipated power should be positive) is through Bayesian statistics. I'm not sure if/how they could be incorporated into the uncertainties module…

In conclusion, it does not seem obvious to me that you need asymmetric uncertainties (the uncertainties package actually handles asymmetric probability distributions, but does not by itself address your problem, which could be considered as in the two points of view I just described).