satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Does sctransform support parallelization? #153

Closed ScreachingFire closed 1 year ago

ScreachingFire commented 1 year ago

Hello!

Apologies if this is easily googleable, I just haven't been able to find information on this yet and my tries to test it out haven't yielded any differences in run-time.

I was wondering if sctransform is able to be run in-parallel, for example using future or something else? I've tried using future::plan("multisession", workers = 6) to see if it resulted in a shorter run-time, but I didn't notice any difference compared to just using one worker.

If someone could shed some light on this I would be incredibly thankful, and thank you for developing sctransform

saketkc commented 1 year ago

There is no parallelization support currently in sctransform, but Seurat does parallelize most steps. What part of the operation is creating a bottle neck for you?

ScreachingFire commented 1 year ago

sctransform::vst() usually takes the longest to run– although I know that's because I tend to use ncells=10k+ when calling the function(I notice better results when a larger number of cells is used). I don't necessarily mind the time it takes to run I just wanted to see if there was a way to make it more efficient, thank you!

saketkc commented 1 year ago

If you use n_cells =10,000 it will use all the 10k cells to run the first step of estimating the NB parameters - which is going to be slow. In our testing the parameters saturate around 5000 cells, but there is of course cases where using the entire cell set might be helpful. You can use the vst.flavor="v2" to obtain speedups over the default method.