statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
238 stars 27 forks source link

Slingshot to tradeSeq from adata object? #34

Closed oligomyeggo closed 4 years ago

oligomyeggo commented 4 years ago

Hi, and thank you for your tutorial! I've been following a workflow from the Theis lab (https://github.com/theislab/single-cell-tutorial/blob/master/latest_notebook/Case-study_Mouse-intestinal-epithelium_1906.ipynb) using my own data, where they do a lot of the work in scanpy and switch to R for trajectory analysis. I opted for using Slingshot as a trajectory method, and wanted to use the tradeSeq package downstream of this. However, I'm running into issues with my counts matrix (I believe). I'm using the Slingshot object generated via the Theis workflow, and the rpy2 interface which should convert my adata object into a SingleCellExperiment object, and from that I should be able to get the counts needed for tradeSeq - or at least that's my understanding.

sds <- SlingshotDataSet(adata) counts <- as.matrix(assays(adata)$counts)

This works so that I get a nice plot that matches my Slingshot data when I run:

plotGeneCount(curve = sds, clusters = adata$louvain)

Running the following returns the 4 expected plots allowing me to determine a nknot value:

set.seed(6) icMat <- evaluateK(counts = counts, sds = sds, k = 3:7, nGenes = 100, verbose = FALSE, plot = TRUE)

But when I try to run the following:

set.seed(6) sce <- fitGAM(counts = counts, sds = sds, nknots = 4, verbose = FALSE, sce=TRUE)

I get the resulting error: could not broadcast input array from shape (2) into shape (1136).

This makes me wonder if maybe there is something wrong with adata to SCE conversion, making my counts matrix weird?

Any thoughts or advice would be greatly appreciated. Thank you!

koenvandenberge commented 4 years ago

Hi, thanks for bringing this up and providing us with the details. The error you are getting actually looks like a Python error message. Could you share the classes of the objects you are using to make sure they are proper R objects? It is indeed unexpected that evaluateK is working but fitGAM is not, since the former uses the latter internally.

oligomyeggo commented 4 years ago

Sure.

First, I apologize. My inputs for tradeSeq are actually as follows:

sds <- SlingshotDataSet(adata_startend) counts <- as.matrix(assays(adata)$counts)

I made my SlingshotDataSet as follows (per the Theis workflow):

adata_startend <- slingshot(adata, clusterLabels = 'louvain_final', reducedDim = 'PCA', start.clus='RGC', end.clus=c('IPC3', 'RGC OL-Like'))

Class objects are as follows:

class(sds) [1] "SlingshotDataSet"

class(counts) [1] "matrix"

It is taking a long time (~20 min) to run fitGAM. Not sure if that is expected.

And if it is helpful, here are the graphs I got from running plotGeneCount and evaluateK (in case you see anything that looks weird; the plotGeneCount looks good to me so far as I'm aware and is consistent with what I saw with Slingshot. Not sure if the evaluateK output looks right or not with closer inspection).

Output from plotGeneCount:

Screen Shot 2020-04-04 at 4 15 33 PM

Output from evaluateK:

Screen Shot 2020-04-04 at 4 15 46 PM
koenvandenberge commented 4 years ago

I don't see anything wrong with your code at first glance. Yes, fitGAM can take some time. You are only getting the error after 20 mins? That sounds like the fitting must have been going well but the post-processing might return the error. Do you mind sharing your sds and counts objects, and the version of tradeSeq you are working with? If you're worried about sharing, a subset of the data that returns the same error should be fine, too.

oligomyeggo commented 4 years ago

Yes, I'm only getting the error after fitGAM has been running for ~20 min or so.

I'm working with tradeSeq_1.1.16.

And here are my sds and counts objects:

counts.rds.gz sds.rds.gz

Thank you for help. I appreciate it. I'm sure it's something simple/silly that I've overlooked somewhere along the way. I just can't figure it out.

koenvandenberge commented 4 years ago

I ran your code with the latest tradeSeq version and the objects you provided, but I can't reproduce the error. Since your error message looks extraordinary could you try doing the following in a new session?

counts <- readRDS("~/Downloads/counts.rds")
sds <- readRDS("~/Downloads/sds.rds")

library(tradeSeq)
set.seed(6)
sce <- fitGAM(counts = counts, sds = sds, nknots = 4, verbose = FALSE, sce=TRUE)
oligomyeggo commented 4 years ago

I tried running the above code in a new jupyter notebook session (as I had been doing) as well as a new RStudio session. In the jupyter notebook, I received the same error as before: could not broadcast input array from shape (2) into shape (1136). In the RStudio session, I received There were 50 or more warnings (use warnings() to see the first 50), being repetitions of

In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, ... : Iteration limit reached without full convergence - check carefully

and

In newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, ... : Fitting terminated with step failure - check results carefully

Not sure if that is helpful. I'm using R version 3.6.2, tradeSeq_1.1.16, and slingshot_1.5.1.

koenvandenberge commented 4 years ago

Is it possible that your jupyter notebook tries to evaluate the R code in python?

In RStudio the warnings are expected, I also got these. So the model fitting ran without errors there. These are warnings which we propagate from mgcv::gam which we rely on to do the fitting of the model. It means that there were difficulties in fitting some genes. You could try to increase the iteration limit to see if that helps, see the fitGAM vignette

oligomyeggo commented 4 years ago

My mistake! It is working when I run it in RStudio. I saw all the warnings and had assumed it didn't work, but going back now I see that running fitGAM did successfully generate a sce object, and I am able to go on to generate plots using functions like plotSmoothers. Not sure why it isn't working in jupyter notebook considering I haven't had issues going back and forth between python and R with other packages in my workflow, but I can try to troubleshoot that later. For now I have no issue doing tradeSeq analysis in RStudio. Thank you for all of your help!

koenvandenberge commented 4 years ago

No worries. I don't work with jupyter notebook but in case you find what could be causing the incompatibility, please do let us know so we can work on that. Let us know if any other issues or questions come up.

oligomyeggo commented 4 years ago

So I guess the could not broadcast input array from shape (2) into shape (1136) error generated in jupyter notebook after runningfitGAM can actually be ignored. It still generates a sce object which works with all the other functions in the tradeSeq workflow. Not sure exactly what the error means, but looks like tradeSeq can effectively be used downstream of scanpy and Slingshot in jupyter notebook.

HectorRDB commented 4 years ago

Hi @oligomyeggo, I've had similar troubles moving R code between R studio and jupyter notebooks. It turned out that the python engine called by R Studio is not necessarily the same as the one called by jupyter (especially if you have python 2.xx and 3.xx on your machine). Just in case you want to check that.

oligomyeggo commented 4 years ago

Thanks for the advice @HectorRDB. I'll check that out and see what versions of python I'm working with between jupyter notebooks and RStudio.