Closed ScreachingFire closed 1 year ago
Thanks for pointing this out. I have updated the develop
branch to fix this: https://github.com/satijalab/sctransform/commit/2cf8c9a2b2f2c22bd1f603569c4c33a57fed494f It will be part of next sctransform release.
You should notice speedups for 2, as you pointed out.
Hello!
I was looking for ways to reduce the memory usage of sctransform while not compromising on runtime and came across two odd things(and likely related to each other).
1- When just using get_residuals() I noticed that it took much longer to compute the residuals just specifying
residual_type = "pearson"
invst()
. I thought that there was just a different method being used byget_residuals()
but when I looked at how long each part of the function took to run the main issue was here(lines 21-27 inget_residuals()
) rather than with the residuals:I checked the code for
get_nz_median()
and saw that there was an easy alternative as sparse matrices store all nonzero values insp.mat@x
. I also noticed that this was already implemented inget_nz_median2()
(https://rdrr.io/cran/sctransform/src/R/utils.R#sym-get_nz_median2). However,get_residuals()
for some reason does not call the updated and much faster version. Luckily you are allowed to manually input the min_variance so it's easy to bypass this, but I wanted to mention it as it seems easily fixable.2- When specifying a set of genes to compute residuals for, for some reason SCTransform takes much longer than I would expect, and looking at the code it seems to do with the same issue as above as in SCTransform specifying a subset of
residual_features
results inget_residuals()
being called. Fixing the first issue will thus likely fix this one.I hope this all checks out, I tried my best to be thorough. Thank you for creating sctransform it's been a wonderful experience to use thus far!