thelovelab / fishpond

Differential expression and allelic analysis, nonparametric statistics
https://thelovelab.github.io/fishpond
27 stars 9 forks source link

Wrong index in randomSamplesToRemove() #35

Closed smlchen closed 1 year ago

smlchen commented 1 year ago

Hello @mikelove , this issue is related to a question that I posted on Bioconductor (link). I did some debugging in RStudio, and I believe there is a bug in randomSamplesToRemove(). Note that the version of fishpond that I am using is 2.6.0.

The inputs I have for the function are as follows.

> condition
[1] neg_control neg_control neg_control neg_control
[5] neg_control 1uM_MeCHO 1uM_MeCHO 1uM_MeCHO
[9] 1uM_MeCHO 1uM_MeCHO 1uM_MeCHO
Levels: neg_control 1uM_MeCHO
> covariate
[1] heterozygous heterozygous heterozygous wildtype
[5] wildtype heterozygous heterozygous heterozygous
[9] wildtype wildtype wildtype
Levels: heterozygous wildtype
> tab
                  covariate
condition    heterozygous  wildtype
   neg_control         3         2
   1uM_MeCHO           3         3

There is an imbalance in the covariate in the negative control (neg_control) group, and I believe that the intention is to remove a heterozygous sample from the negative control group. The current code extracts tab[1,i] and tab[2,i]. In my case, it extracts tab[1,1] and tab[2,1], and the values are 3 and 3 respectively. Their subtraction is then the size input to sample(), meaning that no sample will actually be selected to be removed from the imbalanced group. Ultimately, this causes an error downstream.

Therefore, I believe this line of code should have been

cond1small <- tab[1,i] < tab[i,2]

Likewise, this should have been

tab[i,2] - tab[1,i]

And, this should also have been

tab[1,i] - tab[i,2]

Thank you for your time and consideration!

mikelove commented 1 year ago

Thanks for reporting, is this a fair reproduction of the error:

set.seed(1)
y <- makeSimSwishData(n=12)
y <- scaleInfReps(y)
y <- labelKeep(y)
y$covariate <- factor(rep(c("a","b","a","b"),c(3,3,3,3)))
y <- y[,-6]
table(cond=y$condition, cov=y$covariate)
y <- swish(y, x="condition", cov="covariate", interaction=TRUE)

I will take a look soon and push necessary changes to Bioc.

smlchen commented 1 year ago

Thank you for your prompt response.

Yes, your code snippet will produce the same error. Error in infRepsArray[, cond2, ] : subscript out of bounds

For now, I will just randomly remove a sample to balance out my dataset before calling swish().

mikelove commented 1 year ago

Ok, i'll hopefully have a fix in the next day pushed to Bioc. thanks for noting this

mikelove commented 1 year ago

Thanks again for identifying this, I've pushed what I believe to be a fix in 81a2825.

This is also now pushed to Bioc as version 2.6.1 (release) and 2.7.1 (devel), should be available in 24 hours.