statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
236 stars 27 forks source link

Between-condition tests #106

Open ATpoint opened 3 years ago

ATpoint commented 3 years ago

Hello,

I would like to ask about current implementations towards testing patterns between conditions but within the same lineage.

Basically a question that I would like to answer would be whether early in pseudotime there is evidence for differential expression in the same lineage driven by the condition. This could help to determine why upon pertubation cells experience differential differentiation, e.g. developing into slightly different cell types. That would basically be the earlyDE test but (referring to Figure 1C in the paper) instead of the orange and blue curve being the lineages it should be the condition. The same basically goes for all the other tests in that figure, is it possible to use conditions rather than lineages here?

koenvandenberge commented 3 years ago

Hi @ATpoint,

Thanks for your interest in tradeSeq. We are still actively developing functionalities to test for differential expression between conditions, and have recently merged some of our ideas into the master branch. Right now, in terms of DE tests between conditions within a lineage, there is the conditionTest which tests for differential expression patterns of the average gene expression between conditions within a lineage. We do not yet have functionality that allows you to focus on a specific section of the trajectory. If this is of interest, we can put it on the to-do list.

ATpoint commented 3 years ago

Hello @koenvandenberge, thanks for your reply.

I think it would be a valuable addition that opens up the tool to a broader userbase.

In my case I am interested in finding pertubation-specific gene expression chansignatures. Imagine we have a developmental trajectory from stem- to differentiated cells and we know that the pertubation created altered cellular fates. So the stems develop differently upon pertubation, some cell types vanish, others become more frequent. it would be interested to determine when (along the trajectory) these pertubation effects really become detectable, and then subsequently derive signatures/clusters of genes that describe the expression changes. Are there specific effects early/middle/late in development, questions like this. Currently I do pairwise comparisons between clusters using pseudobulks/DESeq2 but this does not really makes use of the single-cell resolution...yeah and lets not even talk about doing DE on the single-cell level using Wilcox tests...I still find it almost impossible to adequately interpret effect sizes when like 60% of the cells have zeros for a particular gene.

HectorRDB commented 3 years ago

The latest version of tradeSeq now allows for the conditionTest to be run only between some knots. For example, to look only at the beginning of the trajectory, add the option knots=c(1,2). Note however that the number of significantly DE genes is expected to drop quite a lot.

JesseRop commented 3 years ago

Hi, just following up on this to ask whether conditionTest can perform pairwise comparisons for more than 2 conditions in one lineage; for instance in a lineage with 3 conditions (a,b,c) we have 3 pairwise comparisons (a VS b, a VS c and b VS c) and therefore we expect 3 p-values. Currently, the conditionTest output for pairwise comparisons just gives 1 p-value column.

koenvandenberge commented 3 years ago

Hi @JesseRop

Yes, just set pairwise=TRUE in conditionTest. See the help file ?conditionTest for more details.

Best, Koen

JesseRop commented 3 years ago

Thanks @koenvandenberge ! I was able to get it working perfectly in the github development version. However, I am now getting error from plotSmoothers and predictSmooth functions

plotSmoothers(integrated_c_nc_sce_tde_int_all, assays(integrated_c_nc_sce_tde_int_all)$counts,
              gene = "PF3D7-1016300",
              alpha = 1, border = TRUE)

Error in .plotSmoothers_conditions(models = models, counts = counts, gene = gene, : Some coefficients for this gene are NA. Cannot plot this gene.
Traceback:

1. plotSmoothers(gam_d10_minus_asx_oc_tde, assays(gam_d10_minus_asx_oc_tde)$counts, 
 .     gene = rownames(assays(gam_d10_minus_asx_oc_tde)$counts)[oo[1]], 
 .     alpha = 1, border = TRUE)
2. plotSmoothers(gam_d10_minus_asx_oc_tde, assays(gam_d10_minus_asx_oc_tde)$counts, 
 .     gene = rownames(assays(gam_d10_minus_asx_oc_tde)$counts)[oo[1]], 
 .     alpha = 1, border = TRUE)
3. .local(models, ...)
4. .plotSmoothers_conditions(models = models, counts = counts, gene = gene, 
 .     nPoints = nPoints, lwd = lwd, size = size, xlab = xlab, ylab = ylab, 
 .     border = border, alpha = alpha, sample = sample, pointCol = pointCol, 
 .     curvesCols = curvesCols, plotLineages = plotLineages)
5. stop("Some coefficients for this gene are NA. Cannot plot this gene.")
predictSmooth(gam_d10_minus_asx_oc_tde, gene = genes_nf54_asx_oc, nPoints = 50, tidy = FALSE)

Warning message in min(lineageData):
"no non-missing arguments to min; returning Inf"
Warning message in max(lineageData):
"no non-missing arguments to max; returning -Inf"

Error in if (min(lineageData)/max(lineageData) < 0.01) {: missing value where TRUE/FALSE needed
Traceback:

1. predictSmooth(gam_d10_minus_asx_oc_tde, gene = genes_nf54_asx_oc, 
 .     nPoints = 50, tidy = FALSE)
2. predictSmooth(gam_d10_minus_asx_oc_tde, gene = genes_nf54_asx_oc, 
 .     nPoints = 50, tidy = FALSE)
3. .local(models, ...)
4. .predictSmooth_conditions(dm = dm, X = X, beta = beta, pseudotime = pseudotime, 
 .     gene = gene, nPoints = nPoints, conditions = conditions, 
 .     tidy = tidy)
5. .getPredictRangeDf(dm, lineageId = jj, conditionId = kk, nPoints = Points)
HectorRDB commented 3 years ago

Hi @JesseRop The latest version of Github should have fixed the problem but feel free to re-open the issue in case it did not

liuweihanty commented 3 years ago

Hi @HectorRDB , I had the same issue with @JesseRop . When I run the tutorial code(I didn't do any modification to the code before this line) yhatSmooth <- predictSmooth(sce, gene = mockGenes, nPoints = 50, tidy = FALSE) I got the error Error in if (min(lineageData)/max(lineageData) < 0.01) { : missing value where TRUE/FALSE needed

I installed the developer version of tradeseq: devtools::install_github("statOmics/tradeSeq") but it didn't help. Could you please advise? Thank you very much!

HectorRDB commented 3 years ago

HI @liuweihanty A little question: Are you using the same data as the tutorial (I assume you are referring to this)? Could you also share the output of sessionInfo() Thanks

liuweihanty commented 3 years ago

Hi @HectorRDB, thanks for the quick reply! Yes I'm using the same data as in the tutorial(yes the link you attached). Here is the output of my sessionInfo():

R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] gridExtra_2.3 knitr_1.31 fgsea_1.16.0 msigdbr_7.2.1 pheatmap_1.0.12
[6] UpSetR_1.4.0 viridis_0.5.1 viridisLite_0.3.0 scales_1.1.1 clusterExperiment_2.10.1
[11] slingshot_1.8.0 princurve_2.1.6 SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0
[16] GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0
[21] MatrixGenerics_1.2.1 matrixStats_0.58.0 RColorBrewer_1.1-2 tradeSeq_1.5.16 ggpmisc_0.3.8-1
[26] rlang_0.4.10 readxl_1.3.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.4
[31] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.6 ggplot2_3.3.3
[36] tidyverse_1.3.0 SeuratObject_4.0.0 Seurat_4.0.0

loaded via a namespace (and not attached): [1] rappdirs_0.3.3 scattermore_0.7 pkgmaker_0.32.2 bit64_4.0.5 irlba_2.3.3
[6] DelayedArray_0.16.1 data.table_1.14.0 rpart_4.1-15 RCurl_1.98-1.2 doParallel_1.0.16
[11] generics_0.1.0 callr_3.5.1 cowplot_1.1.1 usethis_2.0.1 RSQLite_2.2.3
[16] RANN_2.6.1 VGAM_1.1-5 combinat_0.0-8 future_1.21.0 bit_4.0.4
[21] phylobase_0.8.10 spatstat.data_2.0-0 xml2_1.3.2 lubridate_1.7.9.2 httpuv_1.5.5
[26] assertthat_0.2.1 xfun_0.21 hms_1.0.0 evaluate_0.14 promises_1.2.0.1
[31] fansi_0.4.2 progress_1.2.2 dbplyr_2.1.0 igraph_1.2.6 DBI_1.1.1
[36] htmlwidgets_1.5.3 sparsesvd_0.2 ellipsis_0.3.1 RSpectra_0.16-0 backports_1.2.1
[41] DDRTree_0.1.5 annotate_1.68.0 gridBase_0.4-7 locfdr_1.1-8 deldir_0.2-10
[46] vctrs_0.3.6 remotes_2.3.0 ROCR_1.0-11 abind_1.4-5 cachem_1.0.4
[51] withr_2.4.1 checkmate_2.0.0 sctransform_0.3.2 prettyunits_1.1.1 goftest_1.2-2
[56] softImpute_1.4 cluster_2.1.0 ape_5.4-1 lazyeval_0.2.2 crayon_1.4.1
[61] genefilter_1.72.1 edgeR_3.32.1 pkgconfig_2.0.3 slam_0.1-48 labeling_0.4.2
[66] pkgload_1.2.0 nlme_3.1-148 devtools_2.3.2 nnet_7.3-14 globals_0.14.0
[71] lifecycle_1.0.0 miniUI_0.1.1.1 registry_0.5-1 BiocFileCache_1.14.0 modelr_0.1.8
[76] rprojroot_2.0.2 cellranger_1.1.0 polyclip_1.10-0 lmtest_0.9-38 rngtools_1.5
[81] Matrix_1.2-18 Rhdf5lib_1.12.1 zoo_1.8-8 reprex_1.0.0 base64enc_0.1-3
[86] processx_3.4.5 ggridges_0.5.3 png_0.1-7 bitops_1.0-6 rhdf5filters_1.2.0
[91] rncl_0.8.4 KernSmooth_2.23-17 blob_1.2.1 zinbwave_1.12.0 parallelly_1.23.0
[96] jpeg_0.1-8.1 memoise_2.0.0 magrittr_2.0.1 plyr_1.8.6 ica_1.0-2
[101] howmany_0.3-1 zlibbioc_1.36.0 compiler_4.0.2 HSMMSingleCell_1.10.0 fitdistrplus_1.1-3
[106] cli_2.3.1 ade4_1.7-16 XVector_0.30.0 listenv_0.8.0 ps_1.5.0
[111] patchwork_1.1.1 pbapply_1.4-3 htmlTable_2.1.0 Formula_1.2-4 MASS_7.3-51.6
[116] mgcv_1.8-31 tidyselect_1.1.0 MAST_1.16.0 stringi_1.5.3 densityClust_0.3
[121] yaml_2.2.1 locfit_1.5-9.4 latticeExtra_0.6-29 ggrepel_0.9.1 grid_4.0.2
[126] fastmatch_1.1-0 tools_4.0.2 future.apply_1.7.0 rstudioapi_0.13 uuid_0.1-4
[131] monocle_2.18.0 foreach_1.5.1 foreign_0.8-80 RNeXML_2.4.5 farver_2.0.3
[136] Rtsne_0.15 digest_0.6.27 BiocManager_1.30.10 FNN_1.1.3 shiny_1.6.0
[141] qlcMatrix_0.9.7 Rcpp_1.0.6 broom_0.7.5 later_1.1.0.1 RcppAnnoy_0.0.18
[146] httr_1.4.2 AnnotationDbi_1.52.0 kernlab_0.9-29 colorspace_2.0-0 rvest_0.3.6
[151] XML_3.99-0.5 fs_1.5.0 tensor_1.5 reticulate_1.18 splines_4.0.2
[156] uwot_0.1.10 spatstat.utils_2.0-0 sessioninfo_1.1.1 plotly_4.9.3 xtable_1.8-4
[161] jsonlite_1.7.2 spatstat_1.64-1 testthat_3.0.2 R6_2.5.0 Hmisc_4.4-2
[166] pillar_1.5.0 htmltools_0.5.1.1 mime_0.10 NMF_0.23.0 glue_1.4.2
[171] fastmap_1.1.0 DT_0.17 BiocParallel_1.24.1 codetools_0.2-16 pkgbuild_1.2.0
[176] utf8_1.1.4 lattice_0.20-41 curl_4.3 leiden_0.3.7 bioc2020trajectories_0.0.0.93 [181] survival_3.1-12 limma_3.46.0 rmarkdown_2.7 docopt_0.7.1 desc_1.2.0
[186] fastICA_1.2-2 munsell_0.5.0 rhdf5_2.34.0 GenomeInfoDbData_1.2.4 iterators_1.0.13
[191] HDF5Array_1.18.1 haven_2.3.1 reshape2_1.4.4 gtable_0.3.0

HectorRDB commented 3 years ago

Ok I cannot seem to reproduce it... Could you let me know what steps of the workshop are you running? I.E are you running the tradeSeq fitting and the associationTest again? Or using the pre-stored results?

liuweihanty commented 3 years ago

Hi @HectorRDB , these are the only two steps I didn't run on my desktop and used the prestored data instead: 1). set.seed(3) icMat <- evaluateK(counts = as.matrix(assays(sce)$counts), pseudotime = colData(sce)$slingshot$pseudotime, cellWeights = colData(sce)$slingshot$cellWeights.V1, conditions = factor(colData(sce)$pheno$treatment_id), nGenes = 300, k = 3:7)

2.) set.seed(3) sce <- fitGAM(sce, conditions = factor(colData(sce)$pheno$treatment_id), nknots = 5) mean(rowData(sce)$tradeSeq$converged)

I ran all the other steps on my desktop including associationTest(). is it sth related to the pre-stored data? thanks!

HectorRDB commented 3 years ago

Ok that is weird. I also re-ran all the steps and nothing changed. Did you update the Bioc202trajectories package to its latest version? It looks like it just wanted to check. Could you also let me know if you find the same results as the vignette? in term of number of DE genes and the upset plot?

JesseRop commented 3 years ago

Hi @liuweihanty If I remember correctly this error came about as I was using an older version of tradeSeq to fit the GAM model and a newer version for the plotSmoothers. I think I solved it by rerunning all the steps using the latest version.

HectorRDB commented 3 years ago

Thanks a lot @JesseRop. @liuweihanty let us know if re-running the fitGam step does the trick.

liuweihanty commented 3 years ago

hi @HectorRDB would the latest version of tradeseq being the developer version devtools::install_github("statOmics/tradeSeq") or the one from BiocManager? BiocManager::install("tradeSeq") Thanks!

HectorRDB commented 3 years ago

The developer version (devtools::install_github("statOmics/tradeSeq")) sorry for not being clearer

darshanaka commented 3 months ago

Hi @HectorRDB , thank you for the amazing package. I am running into the same issues as @JesseRop previously did, and I cannot seem to figure out what is going wrong with my analysis as I already tried the fix mentioned here and I could not find this issue mentioned anywhere else .

For your reference, I have a sce object with 4 conditions and one lineage that was predicted with slingshot trajectory inference. I installed the github version of the tradeseq package and ran the following code for fitGam:

scesling <- fitGAM(scesling, conditions = factor(colData(scesling)$groupID), nknots = 5,verbose = TRUE, parallel=TRUE, BPPARAM = BPPARAM, genes=vargenes)

This worked just fine except there were some warnings which came up everytime I ran fitGAM.

Warning in asMethod(object) : sparse->dense coercion: allocating vector of size 2.4 GiB Fitting lineages with multiple conditions. This method has been tested on a couple of datasets, but is still in an experimental phase. Warning in serialize(data, node$con) : 'package:stats' may not be available when loading Warning in serialize(data, node$con) : 'package:stats' may not be available when loading

Following this, I wanted to use plotSmooth to visualise the expression of marker genes in my dataset.

p1<-plotSmoothers(models=scesling, counts=assays(scesling)$counts, gene = "Olig2", alpha = 1, border=TRUE) + ggtitle("Olig2")

However running this gives me the error

Error in .plotSmoothers_conditions(models = models, counts = counts, gene = gene, : Some coefficients for this gene are NA. Cannot plot this gene." 5. stop("Some coefficients for this gene are NA. Cannot plot this gene.") 4. .plotSmoothers_conditions(models = models, counts = counts, gene = gene, nPoints = nPoints, lwd = lwd, size = size, xlab = xlab, ylab = ylab, border = border, alpha = alpha, sample = sample, pointCol = pointCol, curvesCols = curvesCols, plotLineages = plotLineages) 3. .local(models, ...) 2. plotSmoothers(models = scesling, counts = assays(scesling)$counts, gene = "Olig2", alpha = 1, border = TRUE) 1. plotSmoothers(models = scesling, counts = assays(scesling)$counts, gene = "Olig2", alpha = 1, border = TRUE)

However, this error doesn't show up for every gene, only some specific genes. It is a bit confusing for me to understand in that case, how do we know which genes give the NA error and which don't? Is it normal for this error to pop up for certain genes? If this is indeed expected, is it somehow dependent on the gene list I used for the fitGAM function, i.e, genes that were used for fitting will work with the plotSmoothers function but other genes won't?

My session info is as follows:

R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_Germany.utf8 LC_CTYPE=English_Germany.utf8 LC_MONETARY=English_Germany.utf8 [4] LC_NUMERIC=C LC_TIME=English_Germany.utf8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4 tradeSeq_1.13.06
[5] BiocParallel_1.38.0 gridExtra_2.3 ggplot2_3.5.1 knitr_1.46
[9] fgsea_1.30.0 msigdbr_7.5.1 pheatmap_1.0.12 UpSetR_1.4.0
[13] viridis_0.6.5 viridisLite_0.4.2 scales_1.3.0 RColorBrewer_1.1-3
[17] slingshot_2.12.0 TrajectoryUtils_1.12.0 SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 [21] Biobase_2.64.0 GenomicRanges_1.56.0 GenomeInfoDb_1.40.0 IRanges_2.38.0
[25] S4Vectors_0.42.0 BiocGenerics_0.50.0 MatrixGenerics_1.16.0 matrixStats_1.3.0
[29] princurve_2.1.6

loaded via a namespace (and not attached): [1] rstudioapi_0.16.0 jsonlite_1.8.8 magrittr_2.0.3 spatstat.utils_3.0-4 rmarkdown_2.27
[6] zlibbioc_1.50.0 vctrs_0.6.5 ROCR_1.0-11 spatstat.explore_3.2-7 htmltools_0.5.8.1
[11] S4Arrays_1.4.0 SparseArray_1.4.3 sctransform_0.4.1 parallelly_1.37.1 KernSmooth_2.23-24
[16] htmlwidgets_1.6.4 ica_1.0-3 plyr_1.8.9 plotly_4.10.4 zoo_1.8-12
[21] igraph_2.0.3 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.7-0
[26] R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.12 fitdistrplus_1.1-11 future_1.33.2
[31] shiny_1.8.1.1 digest_0.6.35 colorspace_2.1-0 patchwork_1.2.0 tensor_1.5
[36] RSpectra_0.16-1 irlba_2.3.5.1 progressr_0.14.0 spatstat.sparse_3.0-3 fansi_1.0.6
[41] polyclip_1.10-6 httr_1.4.7 abind_1.4-5 mgcv_1.9-1 compiler_4.4.0
[46] withr_3.0.0 fastDummies_1.7.3 MASS_7.3-60.2 DelayedArray_0.30.1 tools_4.4.0
[51] lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.2 goftest_1.2-3 glue_1.7.0
[56] promises_1.3.0 nlme_3.1-164 grid_4.4.0 Rtsne_0.17 reshape2_1.4.4
[61] cluster_2.1.6 snow_0.4-4 generics_0.1.3 spatstat.data_3.0-4 gtable_0.3.5
[66] tidyr_1.3.1 data.table_1.15.4 utf8_1.2.4 XVector_0.44.0 spatstat.geom_3.2-9
[71] RcppAnnoy_0.0.22 stringr_1.5.1 ggrepel_0.9.5 RANN_2.6.1 pillar_1.9.0
[76] spam_2.10-0 babelgene_22.9 RcppHNSW_0.6.0 limma_3.60.0 later_1.3.2
[81] splines_4.4.0 dplyr_1.1.4 lattice_0.22-6 deldir_2.0-4 survival_3.6-4
[86] tidyselect_1.2.1 locfit_1.5-9.9 miniUI_0.1.1.1 pbapply_1.7-2 edgeR_4.2.0
[91] scattermore_1.2 xfun_0.43 statmod_1.5.0 stringi_1.8.4 UCSC.utils_1.0.0
[96] lazyeval_0.2.2 yaml_2.3.8 evaluate_0.23 codetools_0.2-20 tibble_3.2.1
[101] cli_3.6.2 uwot_0.2.2 xtable_1.8-4 reticulate_1.36.1 munsell_0.5.1
[106] Rcpp_1.0.12 spatstat.random_3.2-3 globals_0.16.3 png_0.1-8 parallel_4.4.0
[111] dotCall64_1.1-1 listenv_0.9.1 ggridges_0.5.6 purrr_1.0.2 leiden_0.4.3.1
[116] crayon_1.5.2 rlang_1.1.3 cowplot_1.1.3 fastmatch_1.1-4