karoliskoncevicius / matrixTests

R package for computing multiple hypothesis tests on rows/columns of a matrix or a data.frame
https://cran.r-project.org/web/packages/matrixTests/index.html
36 stars 5 forks source link

Integer overflow in case of big matrices #32

Closed karoliskoncevicius closed 1 year ago

karoliskoncevicius commented 1 year ago

This is a bug reported by @Close-your-eyes via email.

m1 <- matrix(rnorm(100000), nrow=4)                                                                                                                                                                                                 
m2 <- matrix(rnorm(1000000), nrow=4)

row_wilcoxon_twosample(m1, m2)

obs.x  obs.y obs.tot  statistic pvalue location.null alternative exact corrected
1 25000 250000  275000 3130434539     NA             0   two.sided FALSE      TRUE
2 25000 250000  275000 3118218180     NA             0   two.sided FALSE      TRUE
3 25000 250000  275000 3141448608     NA             0   two.sided FALSE      TRUE
4 25000 250000  275000 3117295315     NA             0   two.sided FALSE      TRUE

Warning messages:
1: In nx * ny : NAs produced by integer overflow
2: In nx * ny : NAs produced by integer overflow

This is caused by storing matrix dimensions as integers - then wilcox test multiplies number of observations from both samples together which leads to integer overflow. Solution is to store those values as numeric.

karoliskoncevicius commented 1 year ago

Todo plan:

karoliskoncevicius commented 1 year ago

fixed in dev branch: https://github.com/karoliskoncevicius/matrixTests/commit/75142562d4b51766b465cf71496628d2f58a0076