Switch from colMeans() to colMeans2() in preprocessNoob incurs numerical differences #153

PeteHaitch commented 6 years ago

Switching colMeans() to colMeans2() is tempting; it will be more efficient because we don't allocate the intermediate matrix.



However, the results from using colMeans() vs. colMeans2() are not numerically identical!

x <- matrix(runif(1000, 100, 1000000), ncol = 10)
colMeans(x) - colMeans2(x)
#>  [1]  1.746230e-10  1.746230e-10  0.000000e+00  5.820766e-11  1.164153e-10
#>  [6]  5.820766e-11 -5.820766e-11 -2.910383e-10 -3.492460e-10 -1.746230e-10

For further details, see https://github.com/HenrikBengtsson/matrixStats/issues/96

Consequently, switching to colMeans2() will break the digest-based unit tests.

kasperdanielhansen commented 6 years ago

We should do this, and also use the indexing (the rows and cols arguments) of colMeans2() the places it can be used. Same for colSums() and the row versions. When I switched over to matrixStats a long time ago, matrixStats did not have those functions.