statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
239 stars 28 forks source link

Problem with adding more than one lineage to fitGAM #65

Closed bieniu777 closed 4 years ago

bieniu777 commented 4 years ago

Hello,

I'm new to coding and very new to tradeSeq, and I was hoping to get some help here. I am working on a sc-RNA-seq data that was available online and already pre-processed (normalised) - I only log transformed it. Prior to tradeSeq I used slingshot to find pseudotimes in my cell clusters based on their marker genes expression that I found with Seurat. With dimensions DC1xDC2 I get just one lineage but with Dimension DC1XDC3 I have two lineages (one new). When my slingshot is run on two first dimensions, there is no problem with using fitGAM afterwards. But when I run slingshot on the 3 dimensions or on DC1XDC3, I get NA values in my pseudotimes.

pseudotime <- slingPseudotime(S, na = FALSE) Error in .local(counts, ...) : The pseudotimes contain NA values, and these cannot be used for GAM fitting.

The only way I can run fitGAM is to run both curves separately, with removing NA. For instance, curve 2 only:

For Curve 2:

pseudotime <- slingPseudotime(S, na = FALSE) cellWeights <- slingCurveWeights(S) cW2 <- cellWeights[,2] cW2 <- as.data.frame(cW2) pT2 <- na.omit(pseudotime[, "curve2"]) pT2 <- as.data.frame(pT2) pT2_cells <- rownames(pT2) cw2.2 <- merge(cW2, pT2, by = 0, all=TRUE) cw2.2 <- na.omit(cw2.2) cw2.2 <- as.data.frame(cw2.2) row.names(cw2.2) <- cw2.2[,1] cw2.2 <- cw2.2[,-1]

set.seed(6) sce2 <- fitGAM(counts = TIvar[ ,pT2_cells], pseudotime = cw2.2$pT2, cellWeights = cw2.2$cW2, nknots = 6)

Removing NA values in both pseudotimes does not work, as then they both have different dimensions and I cannot create a single table with both of them.

TIvar is variable features from Seurat (I also tried it on the original marker genes used for slingshot - same outcome):

Y <- as.data.frame(VariableFeatures2000) markersvar <- unique(Y$VariableFeatures2000) TIvar <- as.matrix(HNSCC_fibr3_RNAseq@assays$RNA@counts[markersvar,])

How can I include both lineages in the fitGAM? I want to compare the two lineages that I have.

EDIT: I am a bit confused whether I have a single lineage in my data with two curves, or two lineages.

plotGenePseudotime(S, gene = "RPSA", exprs = TIvarcounts, loess = TRUE) The code above returns two lineages (the only time that I have two lineages in any output). But it actually looks like the curves that I have everywhere else.

Kamila

HectorRDB commented 4 years ago

Hi @bieniu777 No worries, we are happy to help.

If I understand right, you have a Seurat object HNSCC_fibr3_RNAseq. You then use slingshot and get a trajectory with two lineages (i.e two curves). As a side note, we define trajectory as a set of lineages and you get the slingshot object S. Then, you tried to run fitGAM and it failed.

fitGAM has a (matrix, slingshotdataset method, which means you should be able to run:

sce <- fitGAM(counts = TIvar, sds = S, nknots = 6)

Could you please share the exact command you used for slingshot and tradeSeq? Thanks

bieniu777 commented 4 years ago

Hi, @HectorRDB Thank you very much for your reply.

I'll explain what I did step by step.

Marker genes from FindAllMarkers function from Seurat were used to generate a diffusion map and to run slingshot

>markers <- unique(all_Fibr3markers.pos.neg.signif$gene)
>TIcounts <- as.matrix(HNSCC_fibr3_RNAseq@assays$RNA@counts[markers,])

in order to get slingshot to work I switched to ExpressionSet and SingleCellExperiment

>TIcounts <- t(TIcount)
>TIcounts <- as.data.frame(TIcounts)
>TIcounts <- as.ExpressionSet(TIcounts)

gene count matrix for tradeseq analysis: Output from the FindVariableFeatures function from Seurat

>Y <- as.data.frame(VariableFeatures2000)
>markersvar <- unique(Y$VariableFeatures2000)
>TIvarcounts <- as.matrix(HNSCC_fibr3_RNAseq@assays$RNA@counts[markersvar,])

Diffusion map (package destiny):

>DM <- DiffusionMap(TIcounts, verbose = TRUE)
>DM_dim_data <- data.frame(
  DC1 = DM$DC1,
  DC2 = DM$DC2,
  DC3 = DM$DC3,
  DC4 = DM$DC4,
  clusternames = HNSCC_fibr3_RNAseq$clusternames 
)

slingshot

>S <- SingleCellExperiment(TIcounts)
>rd1 <- cbind(DC1 = DM$DC1, DC2 = DM$DC2)
>rd2 <- cbind(DC1 = DM$DC1, DC3 = DM$DC3)
>rd3 <- cbind(DC1 = DM$DC1, DC2 = DM$DC2, DC3 = DM$DC3)
>rd4 <- cbind(DC2 = DM$DC2, DC3 = DM$DC3)
>reducedDims(S) <- SimpleList(DiffMap = rd3)
>colData(S)$clust <- DM_dim_data$clusternames

At this point I select different number or dimensions to run in the slingshot: rd1-rd4, by adding them to reducedDims

>S <- slingshot(S, clusterLabels = "clust", reducedDim = "DiffMap")

Then, if I only have one pseudotime from my slingshot, for instance from rd1 (DC1xDC2), fitGAM runs without problems:

>pseudotime <- slingPseudotime(S, na = FALSE)
>pseudotime <- na.omit(pseudotime)
>cellWeights <- slingCurveWeights(S)
>set.seed(6)

>sce <- fitGAM(counts = TIvarcounts, pseudotime = pseudotime, cellWeights = cellWeights, nknots = 6)

If I have two pseudotimes (so two curves, if I understand right?), like when using rd2 (DC2xDC3) or rd3 (DC1xDC2xDC3) I cannot use the code for fitGAM as above, but I have to modify it and analyse the curves separately, as I wrote in the first comment (removing NA values and choosing one of the two curves from the pseudotime object):

pseudotime <- slingPseudotime(S, na = FALSE)
cellWeights <- slingCurveWeights(S)
cW2 <- cellWeights[,2]
cW2 <- as.data.frame(cW2)
pT2 <- na.omit(pseudotime[, "curve2"])
pT2 <- as.data.frame(pT2)
pT2_cells <- rownames(pT2)
cw2.2 <- merge(cW2, pT2, by = 0, all=TRUE)
cw2.2 <- na.omit(cw2.2)
cw2.2 <- as.data.frame(cw2.2)
row.names(cw2.2) <- cw2.2[,1]
cw2.2 <- cw2.2[,-1]
>sce2 <- fitGAM(counts = TIvar[ ,pT2_cells], pseudotime = cw2.2$pT2, cellWeights = cw2.2$cW2, nknots = 6)

These are the 3D graphs that I have. I want to investigate the differences between the green and the blue cells, among others. How these two curves are different. And at the moment I don't know how to run fitGAM so that it includes both curves at once. https://imgur.com/kXhP2WT https://imgur.com/mVr1SBw

I hope that makes sense.

BW,

Kamila

bieniu777 commented 4 years ago

Hello again,

I just tried what you suggested: sce <- fitGAM(counts = TIvarcounts, sds = SlingshotDataSet(S), nknots = 6)

And I finally have two lineages, it worked! I guess I just complicated everything too much unnecessarily.

Thank you!

BW,

Kamila

HectorRDB commented 4 years ago

Great to know it's working Best