enrichmentPlot did not finish

NoemieL commented 2 years ago

Hi, I am using this package for GSEA on scRNA extracted from a ArchR project as a SingleCellExperiment object. I was able to use enrichIt however when I run enrichmentPlot it is not finishing even after 12h and it is taking a lot of memory (around 50G). Is it normal? Do you have any idea why it is so long? Here is my code: "'

RNA_data=getMatrixFromProject(MM_OnlyPre_peak,useMatrix = "GeneIntegrationMatrix") RNA_data=as(RNA_data, "SingleCellExperiment") row.names(RNA_data)=rowData(RNA_data)$name assayNames(RNA_data)[1]="counts"

m_df_C8<- msigdbr(species = "Homo sapiens", category = "C8") #Homo sapien and Cell type signature gene sets fgsea_sets_C8<- m_df_C8 %>% split(x = .$gene_symbol, f = .$gs_name) fgsea_sets_C8=fgsea_sets_C8[grep("HAY_BONE_MARROW",names(fgsea_sets_C8))]#selection of bone marrow cells fgsea_sets_C8_bis=c(fgsea_sets_C8[grep("PLASMA",names(fgsea_sets_C8))], fgsea_sets_C8[grep("_B_CELL",names(fgsea_sets_C8))],fgsea_sets_C8[grep("PRO_B",names(fgsea_sets_C8))]) #selection of B and plasma cells from bone marrow cells

ES.MM_OnlyPre_peak <- enrichIt(obj = RNA_data, gene.sets = fgsea_sets_C8_bis, method = "UCell",groups = 1000, cores = 8) met.data <- merge(colData(RNA_data), ES.MM_OnlyPre_peak, by = "row.names", all=TRUE) row.names(met.data) <- met.data$Row.names met.data$Row.names <- NULL colData(RNA_data) <- met.data RNA_data class: SingleCellExperiment dim: 18097 48042 metadata(0): assays(1): counts rownames(18097): FAM87B LINC00115 ... TMLHE-AS1 TMLHE rowData names(6): seqnames start ... name idx colnames(48042): P1217_MM_ATAC#AAACTCGAGAATACTG-1 P1217_MM_ATAC#AAACTCGGTCAGGCTC-1 ... P1752_MM_ATAC#TTTGTGTTCGGGAATG-1 P1752_MM_ATAC#TTTGTGTTCTCGGCGA-1 colData names(52): BlacklistRatio DoubletEnrichment ... HAY_BONE_MARROW_FOLLICULAR_B_CELL HAY_BONE_MARROW_PRO_B reducedDimNames(0): mainExpName: NULL altExpNames(0):

p=enrichmentPlot(RNA_data, gene.set = "HAY_BONE_MARROW_FOLLICULAR_B_CELL", gene.sets = fgsea_sets_C8_bis, group = "Translocation") + scale_color_manual(values = colorblind_vector(5)[c(1,4)]) "'

ncborcherding commented 2 years ago

Hey NoemieL,

I think you are experiencing the issue due to the size of your data - the enrichmentPlot() itself is calculating the gene ranks across every group and cell. Which is not optimal when it comes to large data sets.

Let me see how I can modify to optimize for speed and memory.

Nick

NoemieL commented 2 years ago

Hi Nick, Any update? Thanks

ncborcherding commented 2 years ago

Hey NoemieL,

Sorry do not have a timeline, I am on clinical services for the next week and will not have time to code a solution.

Nick

ncborcherding / escape

enrichmentPlot did not finish #59