kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
265 stars 43 forks source link

Is there any way that I could fit the NB-GAM for all the gene in my large dataset? #229

Closed pariaaliour closed 10 months ago

pariaaliour commented 1 year ago

Dear @kstreet13 After deciding on an appropriate number of knots, I tried to fit the NB-GAM for each gene following the instruction on this tutorial "https://statomics.github.io/tradeSeq/articles/fitGAM.html#parallel-computing-1". I have large datasets (max 80k and min 30k number of cells). When I run the below command and providing Parallelization option as BPPARAM <- BiocParallel::bpparam() I still get the limit of memory error even when I specify 400g of memory.

BPPARAM <- BiocParallel::bpparam()
U_model <- model.matrix(~0+colData(ocu)$batchlib+colData(ocu)$sex)
set.seed(20)
ocu <- fitGAM(counts = as.matrix(counts(ocu)),
                           pseudotime = slingPseudotime(colData(ocu)$slingshot),
                           cellWeights = slingCurveWeights(colData(ocu)$slingshot),
                           conditions = factor(colData(ocu)$group_id),
                           U = U_model,
                           nknots = 4,
                           parallel = TRUE,
                           BPPARAM = BPPARAM)
slurmstepd: error: Detected 24 oom-kill event(s) in StepId=20885409.interactive. Some of your processes may have been killed by the cgroup out-of-memory handler.

I need to do gene enrichment analysis afterward. If I do specify some genes to fit the model for I cannot do so. I really appreciate if you have any comments on this issue. Regards, Paria

kstreet13 commented 1 year ago

Hi @pariaaliour ,

fitGAM is from the tradeSeq package, so this question would be more appropriate on that repo. That said, you're not the first to run into these sorts of memory issues, so there might already be some helpful tips in the Issues (this one seems like it might be relevant).

Best, Kelly