kstreet13 / scry

Collection of analysis methods for small count data generated by rafalab members (the methods, not the data).
19 stars 2 forks source link

Stability issues and warnings in 10x Genomics Visium data #17

Open lmweber opened 2 years ago

lmweber commented 2 years ago

Hi, I have been using scry::devianceResiduals() for preprocessing and transformation in 10x Genomics Visium data, and have been running into some possible stability issues and warnings.

I am running the following code:

spe <- nullResiduals(spe, assay = "counts", fam = "binomial", type = "deviance")

and getting the following warning:

Warning messages:
1: In .local(x, ...) : NaNs produced
2: In sqrt(x@x) : NaNs produced

If this warning occurs, then I see that scry::nullResiduals() has returned zeros for some genes, e.g. see this screenshot showing values for the first 20 genes in one dataset:

Screen Shot 2022-03-08 at 20 47 50

Here is some complete reproducible code, using a SpatialExperiment object that I have saved on Dropbox here (30 MB download). This requires SpatialExperiment version >1.5.3, which can be installed from GitHub:

# install version >1.5.3 of SpatialExperiment
remotes::install_github("drighelli/SpatialExperiment")

library(SpatialExperiment)
library(scry)

# load data (.rds file from Dropbox link)
spe <- readRDS("scry_example.rds")

# calculate deviance residuals
spe <- nullResiduals(spe, assay = "counts", fam = "binomial", type = "deviance")

Have you seen this error before, and do you have any ideas what might be causing it? Thank you!

lmweber commented 2 years ago

I have also checked that these are not genes with all zero counts, since this was one idea I had (e.g. insufficient filtering of low-expressed genes).

kstreet13 commented 2 years ago

Hm, that's interesting and unfortunately, I don't know what's causing it.

It seems to be a problem with the binomial model, because running it with fam = "binomial", type = "pearson" produces a similar error:

Warning message:
In sqrt(mhat * (1 - phat)) : NaNs produced

And the same set of 677 genes end up with 0s for the residuals (along with a couple NAs). Both versions that use the poisson model seem to run fine, though.

I shared your suspicion that these might be low-count genes, but it actually looks like they are the genes with the largest counts. All the genes with zero residuals have more than 1 count per cell (ie. more than 3582 total counts). Interestingly, the two NAs mentioned above are the only two genes with exactly 1 count per cell. I don't know why this would be meaningful, but it feels like a clue.

I'm gonna ping @willtownes, to see if he has any insights.

willtownes commented 2 years ago

Hi! I'll look at this more carefully next week (sorry, am facing some deadlines at the moment). Is it possible that some of the cells have one gene that constitutes all the counts? Ie, there is some gene j whose count in that cell equals the total count across all genes, and all the other genes have a zero count for that cell. In that unlikely scenario it would trigger an error condition I originally handled but commented out for speed since I assumed it wouldn't happen in real data: https://github.com/kstreet13/scry/blob/master/R/nullResiduals.R#L21

lmweber commented 2 years ago

No problem, thank you!

I just checked and no, I don't think this is the issue, at least for a few of the values where it breaks down that I just looked at now.