Closed simomounir closed 6 months ago
Hi, me again
Is there any way to change the implementation so it does not forcibly call mclapply? Maybe an argument to only use n.core=1? I tried passing n.core=1 as an argument but it still triggers the same error.
This function causes some pitfalls for Windows users. Please do let me know :).
Thanks in advance,
Cheers.
Sorry for the delay. I am having a really busy week and will try to take a look at it early next week.
I took a look at the code. diff_gene_cluster
doesn't use mclapply
. The adjust variance step before that based on the function preprocess_normalize
uses a function in the Pagoda2
package, which uses parallel computing. As a quick fix, you can use packages other than Pagoda2
(e.g., Seurat
) to perform normalization and differential expression. This way, you can skip preprocess_normalize
and diff_gene_cluster
. All you need is to make sure the output from Seurat
has the same format as diff_expr_result
in function initialize_population
. I will work on an alternative for Windows users in the near future.
Follow up question on the diff_gene_cluster:
I am running this on our snRNA-seq data (so far no issues) and all worked well. However, when I got to this line it just kept computing and computing:
diff_expr=suppressMessages(diff_gene_cluster(pagoda_object = adjust_variance$pagoda.object, cell_cluster_conversion = my_sc_cluster, n.core = 20))
I started using our Large HPC with 28 core and 180GB RAM, since I thought maybe the 90GB might not be enough. There is the option to try an even larger configuration, but I am stumped by how long it takes (it has been running for over 30min). Is this step also a bottleneck for you?
Maybe to add, I want to predict probes based on snRNAseq data. We don`t have spatial data yet.
Cheers
edit: With 270GB RAM, 42cores I managed to get it to run in a reasonable time. I had to assign 40 cores however.
Hi @Boehmin, if you have a relatively big or complex (many cell types) dataset, the diff_gene_cluster
function can take a long time and a lot of RAM. In essence, what this step does is to find marker genes for each cell type. One alternative is to try other marker gene identification methods such as FindAllMarkers
from Seurat, which could be faster. Just to make sure to organize the result in the same format as the output of diff_gene_cluster
. One more note, make sure to save diff_expr
so that you don't need to calculate it again if you need to run the optimization multiple times.
Hi again,
I am trying to run the diff_gene_cluster method in order to initialize the population of genes used for panel creation. Below is what I encountered:
_diff_expr=suppressMessages(diff_gene_cluster(pagoda_object = adjust_variance$pagoda.object, cell_cluster_conversion = cell_cluster_conversiondf, n.core = 1))
Error in mclapply(..., mc.cores = n.cores, mc.preschedule = mc.preschedule) : 'mc.cores' > 1 is not supported on Windows
I tried forcing the n.core=1 argument but it still prompts the same error
Is there any work-around for Windows users?
Thanks in advance for your help.
Cheers