statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
237 stars 27 forks source link

Extremely long fitGAM runtime? #41

Closed Seandelao closed 4 years ago

Seandelao commented 4 years ago

Hello,

I am running fitGAM with parallelization on counts from an object with 11,768 cells and 26,152 genes on a machine with 16 cores and 256 GB of RAM: BPPARAM$workers <- 16 sce <- fitGAM(counts = counts, pseudotime = pseudotime, cellWeights = cellWeights, nknots = 7, verbose = TRUE, BPPARAM = BPPARAM)

The count file was made from non-normalized counts of a Seurat object: exprs_human<- GetAssayData(Alpha_Beta, slot="counts", assay = "RNA") counts <- as.matrix(exprs_human)

Currently, it is telling me that it will take 6 days in order to run. I would assume that this is a unnaturally long wait time?

koenvandenberge commented 4 years ago

What are the dimensions of the objects that you are fitting: number of genes, cells, lineages and knots?

When it's starting to run the estimated runtimes can be inaccurate, but 6 days sounds indeed very long.

Seandelao commented 4 years ago

Hello @koenvandenberge,

I am running it on an object consisting of: -11,768 cells -26,152 genes -110,885,417 counts -7 lineages; two main branches and multiple small branches from Monocle3 -7 knots

koenvandenberge commented 4 years ago

Thanks.

Note that the computation will not be parallelized since you would need to set parallel=TRUE in the fitGAM call. Right now, your argument for BPPARAM will be ignored.

That said, you can decrease the computation by filtering genes that are non-informative (eg are barely expressed in the large majority of cells).

We have not checked yet how the method scales to that large numbers of lineages; you will be estimating 50 parameters for each gene (7*7 knots and one intercept) using 11k data points, so it is not surprising it takes some time. If this is the trajectory from #40 I would personally urge you to reconsider whether those small branches carry any significant biological meaning. Looking at your UMAP plot and believing that the root of the trajectory is right, I rather see 2 lineages instead of 7, while the small branches may be a result of overfitting. Indeed, note how variable the fitted trajectory is if we compare the trajectory you made in the 'pseudotime.pdf' versus the 'pseudotime no + cellweights.pdf'. In the latter plot, why is there all this branching in one lineage and not the other? ..

Seandelao commented 4 years ago

@koenvandenberge Don't know how I missed that, thank you!

I completely agree about the overfitting; unfortunately, that is a product of Monocle3 that I can't seem to get around. Monocle3 isn't placing a "node" at what I know from biological knowledge is the progenitor cell type, so when I try to minimize branch length through some of their parameters is messes up the whole trajectory rather than just eliminating the small branches.

I am currently looking into using Slingshot since I'm only using Monocle3 for pseudotime assessment, but it doesn't seem to do a great job at really complex lineages like I have in some of my other data that I am combining for a paper. Regardless, I'll try out the parallelization.

Thanks again!

Seandelao commented 4 years ago

@koenvandenberge sorry to open this up again; I just finished running the fitGAM function and it has for some reason returned a list.

I can run upstream functions, but when I try to run the plotsmoothers function, I get the following error: 'unable to find an inherited method for function ‘plotSmoothers’ for signature ‘"list"’

koenvandenberge commented 4 years ago

No worries, that sounds like you may have an older version of tradeSeq installed. We recently changed the default to returning a SingleCellExperiment object. So you may want to re-install tradeSeq from our Github page and run this again, sorry for that inconvenience.

Seandelao commented 4 years ago

Hi @koenvandenberge,

No worries!

I've successfully ran the fitGAM function which returned a SCE object! I ended up using Slingshot instead of Monocle3, which reduced the overfitting/small branches. Thank you for the suggestions!

I've been running through some of the DE tests have encountered errors for some of them. When I run: assoRes <- associationTest(sce)

it runs for maybe ~5mins before giving me the error: arguments imply differing number of rows: 26152, 11

Furthermore, when I run: patternRes <- patternTest(sce) or earlyDERes <- earlyDETest(sce, knots = c(1, 2))

I get the error: Error in t(L) %*% Sigma : non-conformable arguments.

It seems like the other DE test are working so far! Any idea what might be going on with the two errors?

-Sean

Edit: Provided another DE test that throws an error for me.

koenvandenberge commented 4 years ago

Could you check whether your pseudotimes have no NA values? If you're using Slingshot, you should get the pseudotimes by setting na=FALSE in the slingPseudotime function.

If that's not the case, it would help if you could share your SCE object after running fitGAM.

Seandelao commented 4 years ago

@koenvandenberge I've run is.na() on pseudtime, which returned FALSE for the curves. I also had na=FALSE.

What is a good way to share the SCE object?

koenvandenberge commented 4 years ago

Google Drive, Dropbox, wetransfer? Anything like that should work. Please send it to koenvdberge@berkeley.edu

Seandelao commented 4 years ago

Sent!

Side question; is possible to plot a specified gene (i.e. not from a DE test) with the PlotSmoothers function? Something like: plotSmoothers(sce, counts, gene = "INS")

koenvandenberge commented 4 years ago

Thanks. This should be fixed now, please let me know if it works for you. There was a problem in returning the output.

Yes, once you have run fitGAM you can plot any gene you like using plotSmoothers(sce, counts, gene = gene). You can check the vignette here.

Seandelao commented 4 years ago

@koenvandenberge the association test works now, but I am still getting the error "Error in t(L) %*% Sigma : non-conformable arguments" for either the patternRes <- patternTest(sce) or earlyDERes <- earlyDETest(sce, knots = c(1, 2))

functions

koenvandenberge commented 4 years ago

Sorry, I forgot to check that. This was because some genes could not be fitted by fitGAM and had NA estimates for the coefficients. The function now returns NA wald statistics for those genes. Thanks for letting us know!

Seandelao commented 4 years ago

Thanks @koenvandenberge!