statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
237 stars 27 forks source link

How to understand the results of associationTest and conditionTest #86

Open ZHIDIHUAYUAN opened 3 years ago

ZHIDIHUAYUAN commented 3 years ago

Hello, Thanks for developing such a great software. It is difficult for me to understand the results of associationTest and conditionTest.

sce1 <- fitGAM(counts = as.matrix(assays(part1_slingshot)$counts[feature,]),
               pseudotime = slingPseudotime(part1_slingshot, na = FALSE),
               cellWeights = slingCurveWeights(part1_slingshot),
               conditions = factor(colData(part1_slingshot)$type),
               nknots = 8,parallel = TRUE, BPPARAM = BPPARAM)

assoRes1 <- associationTest(sce1,lineages = TRUE)

obGenes1 <-  rownames(assoRes1)[which(p.adjust(assoRes1$pvalue_lineage1_conditionOB, "fdr") <= 0.05)] # 4176
wtGenes1 <-  rownames(assoRes1)[which(p.adjust(assoRes1$pvalue_lineage1_conditionWT, "fdr") <= 0.05)] # 1851
length(intersect(obGenes1,wtGenes1))
# [1] 1798

condRes1 <- conditionTest(sce1) 
condRes1$padj <- p.adjust(condRes1$pvalue, "fdr")
conditionGenes<- rownames(condRes1)[which(condRes1$padj <= 0.05)] # 1249

The results of associationTest show that there are 4176 genes changing with pseudotime for ob condition, and 1851 genes for WT condition. The intersection of them are 1798 genes. Does this mean that there are 2378 (4176-1798=2378) genes that only change in OB over pseudotime?But the results of conditionTest show that there are only 1249 genes that show differential expressions between conditions. Isn't it a differential expression that only changes with pseudotime in OB and does not change with pseudotime in WT?

koenvandenberge commented 3 years ago

Hi @ZHIDIHUAYUAN

It seems like most genes that are associated with pseudotime in WT condition indeed also do so in the OB condition. The 2378 genes you mention are indeed only found to be significantly associated in the OB condition. This does not mean that they are not changing in the WT condition, as not being able to reject the null hypothesis doesn't mean that we can accept it. We just don't have enough evidence to reject it. Our null hypothesis here is that gene expression is not associated with pseudotime.

It can happen that genes that are found to be associated with pseudotime in only one of your conditions, are not significant in the conditionTest, and vice versa. You should not expect these numbers to match exactly, as these are two different tests.

ZHIDIHUAYUAN commented 3 years ago

Thank you very much for your reply! I have another question. In fact, I have trouble in understanding conditionTest. Can I understand conditionTest as that it conducts the patternTest between the smoother for each condition?

koenvandenberge commented 3 years ago

Yes, that is indeed exactly what it is doing!

ZHIDIHUAYUAN commented 3 years ago

Thank you so much!

erzakiev commented 3 weeks ago

just to recap: what is the difference between associationTest and conditionTest? Say, if I want to find genes that are differentially expressed along a lineage that spans through several days (a typical developmental process study, see example in the attached image), which one is preferred?

Screenshot 2024-08-18 at 14 25 31

I read a vignette where the author used associationTest for two-condition lineage (mock vs treatment), but judging by the explanation for the test given in said vignette ('tests the null hypothesis that gene expression is not a function of pseudotime, i.e., whether the estimated smoothers are significantly varying as a function of pseudotime within each lineage.') should also fit well for my imaginary example of a multiple-day developmental process?