hms-dbmi / scde

R package for analyzing single-cell RNA-seq data
http://pklab.med.harvard.edu/scde
Other
170 stars 64 forks source link

error modeling knn takes up all cores #51

Open dtm2117 opened 6 years ago

dtm2117 commented 6 years ago

Hello,

I am trying to run the error modeling step on a sample with thousands of cells. I've increased K and min.nonfailed. But I find that even if I set n.cores = 1 , I have threads running on every core of the cluster. Furthermore I've never had the error modeling step finish.

Any thoughts on this issue?

pkharchenko commented 6 years ago

On Sep 14, 2017, at 10:41 AM, dtm2117 notifications@github.com wrote:

Hello,

I am trying to run the error modeling step on a sample with thousands of cells. I've increased K and min.nonfailed. But I find that even if I set n.cores = 1 , I have threads running on every core of the cluster. Furthermore I've never had the error modeling step finish.

Any thoughts on this issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/scde/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/ALT78h8XRbxBgUwdKoP_15e2RofPAhw6ks5siTsXgaJpZM4PXsYB.

dtm2117 commented 6 years ago

Here is the command: knn <- knn.error.models(cd_new_nodup, k = ncol(cd)/2, n.cores = 12, min.count.threshold = 1, min.nonfailed = 20, max.model.plots = 10)

dim of the matrix is ~ 13k genes but 4k cells. I've filtered out any genes that have no expression.

dtm2117 commented 6 years ago

This is UMI data also

pkharchenko commented 6 years ago

For the runtime issue, I think k needs to be lowered considerably. To something like 50 or a 100. It just needs sufficient number of neighboring cells to calculate the few parameters for the error model. 2k cells would be definitely an overkill for that. Not sure about the number of cores though. Can you see if doing something like pagoda:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) also uses too many cores? Best, -peter.

On Sep 21, 2017, at 12:12 PM, dtm2117 notifications@github.com wrote:

Here is the command: knn <- knn.error.models(cd_new_nodup, k = ncol(cd)/2, n.cores = 12, min.count.threshold = 1, min.nonfailed = 20, max.model.plots = 10)

dim of the matrix is ~ 13k genes but 4k cells. I've filtered out any genes that have no expression.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/scde/issues/51#issuecomment-331205975, or mute the thread https://github.com/notifications/unsubscribe-auth/ALT78gYZAjHmjdofH7JKoK8acS3Bxrmuks5skorxgaJpZM4PXsYB.

dtm2117 commented 6 years ago

Ok, I will try this. On the scde help page it says that k may need to be increased for 1000s of cells, which is why I kept the denominator low.

Can't run that command because the pagoda library is not installed apparently. I thought it was installed along with SCDE package?

dtm2117 commented 6 years ago

When running scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) It finishes in about 2 seconds, and can't tell the core usage.

pkharchenko commented 6 years ago

Yes, I meant scde package. You'd need to increase the rnorm argument to take some sizeable amount of time.

-peter.

On Sep 21, 2017, at 12:37, dtm2117 notifications@github.com wrote:

When running scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) It finishes in about 2 seconds, and can't tell the core usage.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dtm2117 commented 6 years ago

after increasing rnorm, it seems to be running on 12 cores only.

dtm2117 commented 6 years ago

While the scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) runs on only 12 cores, the knn parameter still uses all cores.

Any ideas on why the discrepancy?