satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Weirdly long and likely unneeded runtimes for get_residuals() and SCTransform(residual_features = [subset of genes]) #154

Closed ScreachingFire closed 1 year ago

ScreachingFire commented 1 year ago

Hello!

I was looking for ways to reduce the memory usage of sctransform while not compromising on runtime and came across two odd things(and likely related to each other).

1- When just using get_residuals() I noticed that it took much longer to compute the residuals just specifying residual_type = "pearson" in vst(). I thought that there was just a different method being used by get_residuals() but when I looked at how long each part of the function took to run the main issue was here(lines 21-27 in get_residuals()) rather than with the residuals:

if (min_variance == "umi_median") {
    min_var <- (get_nz_median(umi)/5)^2
    if (verbosity > 0) {
      message(paste("Setting min_variance based on median UMI: ", 
        min_var))
    }
  }

I checked the code for get_nz_median() and saw that there was an easy alternative as sparse matrices store all nonzero values in sp.mat@x. I also noticed that this was already implemented in get_nz_median2()(https://rdrr.io/cran/sctransform/src/R/utils.R#sym-get_nz_median2). However, get_residuals() for some reason does not call the updated and much faster version. Luckily you are allowed to manually input the min_variance so it's easy to bypass this, but I wanted to mention it as it seems easily fixable.

2- When specifying a set of genes to compute residuals for, for some reason SCTransform takes much longer than I would expect, and looking at the code it seems to do with the same issue as above as in SCTransform specifying a subset of residual_features results in get_residuals() being called. Fixing the first issue will thus likely fix this one.

I hope this all checks out, I tried my best to be thorough. Thank you for creating sctransform it's been a wonderful experience to use thus far!

saketkc commented 1 year ago

Thanks for pointing this out. I have updated the develop branch to fix this: https://github.com/satijalab/sctransform/commit/2cf8c9a2b2f2c22bd1f603569c4c33a57fed494f It will be part of next sctransform release.

You should notice speedups for 2, as you pointed out.