hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
118 stars 55 forks source link

Consider using Forks for parallelism? #83

Closed Shians closed 3 years ago

Shians commented 5 years ago

By default parallel::makeCluster() uses type = "PSOCK" for its parallelism. But changing the type to "FORK" is more efficient on non-Windows systems. I suggest wherever makeCluster() is used the code be changed to

if (.Platform$OS.type == "windows") {
    cl <- parallel::makeCluster(n_cores, type = "PSOCK", outfile = "")
} else {
    cl <- parallel::makeCluster(n_cores, type = "FORK", outfile = "")
}

This also significantly reduces memory usage on systems where forking is available.

With my quick testing of 300 cells with 5000 genes and ks = 1:10 I saw the PSOCK version taking ~90 seconds and the FORK version taking ~60 seconds.

The downside would be that forks do not parallelise across clusters.

wikiselev commented 5 years ago

Hi, many thanks for your suggestions and very sorry for a late reply. Could you please create a pull request with the suggested changes and we will merge it to master? Thanks again, Vlad

gorgitko commented 3 years ago

@Shians See https://github.com/hemberg-lab/SC3/pull/104

Shians commented 3 years ago

Very nice! Glad to see such a clean revamp of the parallelism.