Closed Seandelao closed 4 years ago
What are the dimensions of the objects that you are fitting: number of genes, cells, lineages and knots?
When it's starting to run the estimated runtimes can be inaccurate, but 6 days sounds indeed very long.
Hello @koenvandenberge,
I am running it on an object consisting of: -11,768 cells -26,152 genes -110,885,417 counts -7 lineages; two main branches and multiple small branches from Monocle3 -7 knots
Thanks.
Note that the computation will not be parallelized since you would need to set parallel=TRUE
in the fitGAM
call. Right now, your argument for BPPARAM
will be ignored.
That said, you can decrease the computation by filtering genes that are non-informative (eg are barely expressed in the large majority of cells).
We have not checked yet how the method scales to that large numbers of lineages; you will be estimating 50 parameters for each gene (7*7 knots and one intercept) using 11k data points, so it is not surprising it takes some time. If this is the trajectory from #40 I would personally urge you to reconsider whether those small branches carry any significant biological meaning. Looking at your UMAP plot and believing that the root of the trajectory is right, I rather see 2 lineages instead of 7, while the small branches may be a result of overfitting. Indeed, note how variable the fitted trajectory is if we compare the trajectory you made in the 'pseudotime.pdf' versus the 'pseudotime no + cellweights.pdf'. In the latter plot, why is there all this branching in one lineage and not the other? ..
@koenvandenberge Don't know how I missed that, thank you!
I completely agree about the overfitting; unfortunately, that is a product of Monocle3 that I can't seem to get around. Monocle3 isn't placing a "node" at what I know from biological knowledge is the progenitor cell type, so when I try to minimize branch length through some of their parameters is messes up the whole trajectory rather than just eliminating the small branches.
I am currently looking into using Slingshot since I'm only using Monocle3 for pseudotime assessment, but it doesn't seem to do a great job at really complex lineages like I have in some of my other data that I am combining for a paper. Regardless, I'll try out the parallelization.
Thanks again!
@koenvandenberge sorry to open this up again; I just finished running the fitGAM function and it has for some reason returned a list.
I can run upstream functions, but when I try to run the plotsmoothers function, I get the following error: 'unable to find an inherited method for function ‘plotSmoothers’ for signature ‘"list"’
No worries, that sounds like you may have an older version of tradeSeq
installed. We recently changed the default to returning a SingleCellExperiment
object. So you may want to re-install tradeSeq
from our Github page and run this again, sorry for that inconvenience.
Hi @koenvandenberge,
No worries!
I've successfully ran the fitGAM function which returned a SCE object! I ended up using Slingshot instead of Monocle3, which reduced the overfitting/small branches. Thank you for the suggestions!
I've been running through some of the DE tests have encountered errors for some of them. When I run: assoRes <- associationTest(sce)
it runs for maybe ~5mins before giving me the error: arguments imply differing number of rows: 26152, 11
Furthermore, when I run: patternRes <- patternTest(sce) or earlyDERes <- earlyDETest(sce, knots = c(1, 2))
I get the error: Error in t(L) %*% Sigma : non-conformable arguments.
It seems like the other DE test are working so far! Any idea what might be going on with the two errors?
-Sean
Edit: Provided another DE test that throws an error for me.
Could you check whether your pseudotimes have no NA
values? If you're using Slingshot, you should get the pseudotimes by setting na=FALSE
in the slingPseudotime
function.
If that's not the case, it would help if you could share your SCE object after running fitGAM
.
@koenvandenberge I've run is.na() on pseudtime, which returned FALSE for the curves. I also had na=FALSE.
What is a good way to share the SCE object?
Google Drive, Dropbox, wetransfer? Anything like that should work. Please send it to koenvdberge@berkeley.edu
Sent!
Side question; is possible to plot a specified gene (i.e. not from a DE test) with the PlotSmoothers function? Something like: plotSmoothers(sce, counts, gene = "INS")
Thanks. This should be fixed now, please let me know if it works for you. There was a problem in returning the output.
Yes, once you have run fitGAM
you can plot any gene you like using plotSmoothers(sce, counts, gene = gene)
. You can check the vignette here.
@koenvandenberge the association test works now, but I am still getting the error "Error in t(L) %*% Sigma : non-conformable arguments" for either the patternRes <- patternTest(sce) or earlyDERes <- earlyDETest(sce, knots = c(1, 2))
functions
Sorry, I forgot to check that.
This was because some genes could not be fitted by fitGAM
and had NA
estimates for the coefficients. The function now returns NA
wald statistics for those genes.
Thanks for letting us know!
Thanks @koenvandenberge!
Hello,
I am running fitGAM with parallelization on counts from an object with 11,768 cells and 26,152 genes on a machine with 16 cores and 256 GB of RAM: BPPARAM$workers <- 16 sce <- fitGAM(counts = counts, pseudotime = pseudotime, cellWeights = cellWeights, nknots = 7, verbose = TRUE, BPPARAM = BPPARAM)
The count file was made from non-normalized counts of a Seurat object: exprs_human<- GetAssayData(Alpha_Beta, slot="counts", assay = "RNA") counts <- as.matrix(exprs_human)
Currently, it is telling me that it will take 6 days in order to run. I would assume that this is a unnaturally long wait time?