willtownes / quminorm

Turning the quasi-UMI method into a bioconductor package
GNU Lesser General Public License v3.0
10 stars 2 forks source link

Output has cells with NAs #9

Open ChristophH opened 3 years ago

ChristophH commented 3 years ago

Hi Will,

quasi-UMIs seem like a great way to feed non-UMI data to UMI-based methods. Thank you for all your work and this R package. When trying it out with some subset of the Tabula Muris bone marrow smart-seq2 data (count matrix here) the quminorm function finished without any messages or errors, but the resulting matrix had some cells with just 0 and NAs. At first glance, the offending cells are not standing out in the original smart-seq2 count matrix. Do you have any idea what might be going on? In my particular example there are 130 cells (out of 5,051) giving this problem.

Cheers, Christoph

willtownes commented 3 years ago

Hi Christoph, Thanks for checking out the package and letting me know about this. I'll take a look to see what's causing the NAs. Most likeliy it is a numerical error in constructing the Poisson-lognormal cumulative distribution function: https://github.com/willtownes/quminorm/blob/master/R/quminorm.R#L81 . By default these errors are caught silently and the offending values replaced with NAs. Possibly I should expose this flag to users so they can see the actual error.

Note to self: loaded the RDS and it appears to be a sparse dgCMatrix with 17,479 genes, none of the rows or columns are all zero, the zero fractions for cells ranges from about 0.55-0.98

willtownes commented 3 years ago

Looked at this more carefully, it appears that the cells where quminorm is failing have much higher zero fractions than the other cells (see boxplot). Likely the default shape parameter of 2.0 is not appropriate when the zero fraction is extremely high. As a quick fix I recommend tinkering with the shape parameter to see if this resolves the issue (I will also try it).

Longer term, do you think it would help users if I raise a warning or an error when this happens?

image

willtownes commented 3 years ago

confirmed that by changing the shape from 2.0 to 1.5 or 1.0 it no longer produces any NAs on my machine. Here's the code I used:

library(Matrix)
library(quminorm)
m<-readRDS("~/Downloads/tabula_muris_marrow_5k_SS2.rds")
pz<-1-colMeans(m>0)
system.time(qm<-quminorm(m,shape=2.0,mc.cores=3))
bad<-apply(qm,2,function(x){any(is.na(x))})
table(bad)
boxplot(pz~bad,xlab="quminorm is all NA or zero",ylab="zero fraction")
m2<-m[,bad]
qm2<-quminorm(m2,shape=1.5)
any(is.na(qm2))
ChristophH commented 3 years ago

Thank you for looking into it. These 'cells' look strange and it might make sense to discard them. I'll explore changing the shape parameter as well.

In any case, some sort of message/warning would be helpful, since I did not expect NAs in the output and ran into problems downstream.

willtownes commented 3 years ago

Reopening to remind myself to add a warning or message when there are cells with NAs