r-quantities / errors

Uncertainty Propagation for R Vectors
https://r-quantities.github.io/errors
Other
49 stars 8 forks source link

Incorrect propagation with `mean` #9

Closed josherrickson closed 7 years ago

josherrickson commented 7 years ago

Using the 0.0.2 package from CRAN:

> x <- 1:3
> errors(x) <- c(.1, .2, .3)
> x
errors: 0.1 0.2 0.3
[1] 1 2 3
>
> sqrt(.1^2 + .2^2 + .3^2)/3
[1] 0.1247219
>
> errors(mean(x)) # wrong
[1] 0.5773503
> errors(sum(x)/3) # right
[1] 0.1247219
> errors((x[1] + x[2] + x[3])/3) #right
[1] 0.1247219
Enchufa2 commented 7 years ago

The mean does not propagate the error as a simple sum, but it takes the standard error of the mean.

The standard error of the mean (SEM) is the standard deviation of the sample-mean's estimate of a population mean. (It can also be viewed as the standard deviation of the error in the sample mean with respect to the true mean, since the sample mean is an unbiased estimator.) SEM is usually estimated by the sample estimate of the population standard deviation (sample standard deviation) divided by the square root of the sample size (assuming statistical independence of the values in the sample).

In this case,

sd(x)/sqrt(length(x))
#> [1] 0.5773503

So the propagation is ok.

josherrickson commented 7 years ago

Is that what the goal of the function should be though? In a package dedicated to error propagation, I'd anticipate any arithmetic function to propagate errors using the basic rules. Returning the SEM in another function (e.g. sem(x)) is fine, but the fact that errors(mean(x)) != errors(sum(x)/length(x)) seems likely to lead to confusion.

Enchufa2 commented 7 years ago

Thanks for the report, but I don't see any reason for an additional function.

  1. This package has a very specific goal, which is to deal with measurements in the context of science and engineering. In this specific context, when you are averaging several measurements, you always want the SEM.
  2. There is already an easy way to get the result of that operation without the statistical context and implications of an average, and you've already used it: sum(x)/length(x).
  3. The documentation page for ?mean.errors is pretty clear, so there is no possible confusion:

Details

The mean and weighted.mean methods set the error as the maximum of the standard error of the mean and the (weighted) mean of the errors.