kdkorthauer / scDD

R package to identify genes with differential distributions in single-cell RNA-seq
32 stars 15 forks source link

Problem with Large set of genes #3

Closed sinjini8 closed 8 years ago

sinjini8 commented 8 years ago

Hello, I have a large set of genes (>11000). But it seems that the function scDD takes very very long time to run even for a few permutations. Is there any possible way to reduce the computational time? Even for 1000 genes it took 3 hrs to run 10 permutations ! Please help. Thanks in advance !

kdkorthauer commented 8 years ago

Hi @sinjini8!

Thanks for your interest in running scDD! As you have realized, the permutation testing step in the scDD framework can be computationally intensive if you don't have multiple cores so that the computation can be parallelized. If that is the case, please check out the newly added option to use an alternate test for this step - for details see Section 4: "Alternate Test for Differential Distributions" of the vignette.

Essentially, set the number of permutations to 0, and the scDD function will utilize the Kolmogorov-Smirnov test for this step instead of the full permutation framework. This is helpful if you have limited computational resources, or if you just want to get a quick feel for what the results look like before running the full procedure. Since this is a newly added feature, to take advantage of it, you'll need to uninstall and reinstall scDD as follows:

remove.packages('scDD') library(devtools) devtools::install_github("kdkorthauer/scDD")

Enjoy! And feel free to reach out if you have any more questions or comments.

Best, Keegan