error modeling knn takes up all cores

dtm2117 commented 6 years ago

Hello,

I am trying to run the error modeling step on a sample with thousands of cells. I've increased K and min.nonfailed. But I find that even if I set n.cores = 1 , I have threads running on every core of the cluster. Furthermore I've never had the error modeling step finish.

Any thoughts on this issue?

pkharchenko commented 6 years ago

The number of cores might have to do with the environment, but just in case, could you please provide the exact command you’re using.
In terms of runtime, it depends on the number of cells (linearly) and the number of genes. Filtering out low-expressed genes from the matrix should speed up calculations considerably.

On Sep 14, 2017, at 10:41 AM, dtm2117 notifications@github.com wrote:

Hello,

I am trying to run the error modeling step on a sample with thousands of cells. I've increased K and min.nonfailed. But I find that even if I set n.cores = 1 , I have threads running on every core of the cluster. Furthermore I've never had the error modeling step finish.

Any thoughts on this issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/scde/issues/51, or mute the thread https://github.com/notifications/unsubscribe-auth/ALT78h8XRbxBgUwdKoP_15e2RofPAhw6ks5siTsXgaJpZM4PXsYB.

dtm2117 commented 6 years ago

Here is the command: knn <- knn.error.models(cd_new_nodup, k = ncol(cd)/2, n.cores = 12, min.count.threshold = 1, min.nonfailed = 20, max.model.plots = 10)

dim of the matrix is ~ 13k genes but 4k cells. I've filtered out any genes that have no expression.

dtm2117 commented 6 years ago

This is UMI data also

pkharchenko commented 6 years ago

For the runtime issue, I think k needs to be lowered considerably. To something like 50 or a 100. It just needs sufficient number of neighboring cells to calculate the few parameters for the error model. 2k cells would be definitely an overkill for that. Not sure about the number of cores though. Can you see if doing something like pagoda:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) also uses too many cores? Best, -peter.

On Sep 21, 2017, at 12:12 PM, dtm2117 notifications@github.com wrote:

Here is the command: knn <- knn.error.models(cd_new_nodup, k = ncol(cd)/2, n.cores = 12, min.count.threshold = 1, min.nonfailed = 20, max.model.plots = 10)

dim of the matrix is ~ 13k genes but 4k cells. I've filtered out any genes that have no expression.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/scde/issues/51#issuecomment-331205975, or mute the thread https://github.com/notifications/unsubscribe-auth/ALT78gYZAjHmjdofH7JKoK8acS3Bxrmuks5skorxgaJpZM4PXsYB.

dtm2117 commented 6 years ago

Ok, I will try this. On the scde help page it says that k may need to be increased for 1000s of cells, which is why I kept the denominator low.

Can't run that command because the pagoda library is not installed apparently. I thought it was installed along with SCDE package?

dtm2117 commented 6 years ago

When running scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) It finishes in about 2 seconds, and can't tell the core usage.

pkharchenko commented 6 years ago

Yes, I meant scde package. You'd need to increase the rnorm argument to take some sizeable amount of time.

-peter.

On Sep 21, 2017, at 12:37, dtm2117 notifications@github.com wrote:

When running scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) It finishes in about 2 seconds, and can't tell the core usage.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

dtm2117 commented 6 years ago

after increasing rnorm, it seems to be running on 12 cores only.

dtm2117 commented 6 years ago

While the scde:::papply( 1:1e2, function(x) rnorm(1e3), n.cores=12) runs on only 12 cores, the knn parameter still uses all cores.

Any ideas on why the discrepancy?

hms-dbmi / scde

error modeling knn takes up all cores #51