kstreet13 / slingshot

Functions for identifying and characterizing continuous developmental trajectories in single-cell data.
259 stars 42 forks source link

visually wrong trajectory, am I missing something #173

Closed stemangiola closed 2 years ago

stemangiola commented 2 years ago

Hello,

I run slingshot on diffusion map, and I get a quite apparently wrong trajectory. Am I missing something? Is the problem that the pink cluster is located at two different points of the trajectory.

xx = slingshot::slingshot(
  pbmc_CD4_slingshot, clusterLabels = colData(pbmc_CD4_slingshot)$`predicted.celltype.l2`, 
  reducedDim = "diffusion",
  allow.breaks = F
)

library(RColorBrewer)
plot(reducedDims(xx)$diffusion, col = brewer.pal(9,'Set1')[as.numeric(as.factor(xx$`seurat_clusters`))])
lines(SlingshotDataSet(xx), lwd=2, col='black')

image

For completeness the umap looks like this

image

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /stornext/System/data/apps/R/R-4.1.0/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.1.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2             slingshot_2.2.0                TrajectoryUtils_1.2.0          princurve_2.1.6               
 [5] scales_1.1.1                   patchwork_1.1.1                sccomp_0.99.18                 rstan_2.26.6                  
 [9] StanHeaders_2.26.6             here_1.0.1                     tidySingleCellExperiment_1.3.3 SingleCellExperiment_1.16.0   
[13] SummarizedExperiment_1.24.0    Biobase_2.54.0                 GenomicRanges_1.46.1           GenomeInfoDb_1.30.0           
[17] IRanges_2.28.0                 S4Vectors_0.32.2               BiocGenerics_0.40.0            MatrixGenerics_1.6.0          
[21] matrixStats_0.61.0             tidyseurat_0.4.0               ttservice_0.1.2                SeuratWrappers_0.3.0          
[25] SeuratObject_4.0.3             Seurat_4.0.5                   forcats_0.5.1                  stringr_1.4.0                 
[29] dplyr_1.0.7                    purrr_0.3.4                    readr_2.1.0                    tidyr_1.1.4                   
[33] tibble_3.1.6                   ggplot2_3.3.5                  tidyverse_1.3.1               

loaded via a namespace (and not attached):
  [1] rsvd_1.0.5                    vcd_1.4-9                     ica_1.0-2                     class_7.3-19                 
  [5] ps_1.6.0                      rprojroot_2.0.2               foreach_1.5.1                 lmtest_0.9-39                
  [9] crayon_1.4.2                  laeken_0.5.2                  V8_3.6.0                      spatstat.core_2.3-1          
 [13] MASS_7.3-54                   dittoSeq_1.6.0                nlme_3.1-153                  backports_1.3.0              
 [17] reprex_2.0.1                  rlang_0.4.12                  XVector_0.34.0                ROCR_1.0-11                  
 [21] readxl_1.3.1                  irlba_2.3.3                   callr_3.7.0                   limma_3.50.0                 
 [25] smoother_1.1                  filelock_1.0.2                BiocParallel_1.28.1           rjson_0.2.20                 
 [29] bit64_4.0.5                   glue_1.5.0                    loo_2.4.1                     pheatmap_1.0.12              
 [33] sctransform_0.3.2             parallel_4.1.0                processx_3.5.2                vipor_0.4.5                  
 [37] spatstat.sparse_2.0-0         AnnotationDbi_1.56.2          spatstat.geom_2.3-0           haven_2.4.3                  
 [41] tidyselect_1.1.1              fitdistrplus_1.1-6            zoo_1.8-9                     xtable_1.8-4                 
 [45] RcppHNSW_0.3.0                magrittr_2.0.1                evaluate_0.14                 scuttle_1.4.0                
 [49] cli_3.1.0                     zlibbioc_1.40.0               rstudioapi_0.13               miniUI_0.1.1.1               
 [53] sp_1.4-6                      rpart_4.1-15                  RcppEigen_0.3.3.9.1           widyr_0.1.4                  
 [57] shiny_1.7.1                   BiocSingular_1.10.0           xfun_0.28                     clue_0.3-60                  
 [61] inline_0.3.19                 pkgbuild_1.3.1                cluster_2.1.2                 pcaMethods_1.86.0            
 [65] KEGGREST_1.34.0               interactiveDisplayBase_1.32.0 ggrepel_0.9.1                 listenv_0.8.0                
 [69] dendextend_1.15.2             Biostrings_2.62.0             png_0.1-7                     future_1.23.0                
 [73] withr_2.4.3                   bitops_1.0-7                  ranger_0.13.1                 plyr_1.8.6                   
 [77] cellranger_1.1.0              e1071_1.7-9                   dqrng_0.3.0                   pillar_1.6.4                 
 [81] RcppParallel_5.1.4            GlobalOptions_0.1.2           cachem_1.0.6                  fs_1.5.2                     
 [85] scatterplot3d_0.3-41          TTR_0.24.2                    GetoptLong_1.0.5              nanny_0.2.0                  
 [89] DelayedMatrixStats_1.16.0     xts_0.12.1                    vctrs_0.3.8                   ellipsis_0.3.2               
 [93] generics_0.1.1                tidyHeatmap_1.4.0             tools_4.1.0                   beeswarm_0.4.0               
 [97] munsell_0.5.0                 proxy_0.4-26                  DelayedArray_0.20.0           fastmap_1.1.0                
[101] compiler_4.1.0                abind_1.4-5                   httpuv_1.6.3                  ExperimentHub_2.2.0          
[105] plotly_4.10.0                 GenomeInfoDbData_1.2.7        gridExtra_2.3                 edgeR_3.36.0                 
[109] lattice_0.20-45               deldir_1.0-6                  utf8_1.2.2                    later_1.3.0                  
[113] BiocFileCache_2.2.0           jsonlite_1.7.2                ggplot.multistats_1.0.0       ScaledMatrix_1.2.0           
[117] pbapply_1.5-0                 carData_3.0-4                 sparseMatrixStats_1.6.0       lazyeval_0.2.2               
[121] promises_1.2.0.1              car_3.0-12                    doParallel_1.0.16             R.utils_2.11.0               
[125] goftest_1.2-3                 spatstat.utils_2.2-0          reticulate_1.22               rmarkdown_2.11               
[129] cowplot_1.1.1                 statmod_1.4.36                Rtsne_0.15                    uwot_0.1.10                  
[133] igraph_1.2.8                  survival_3.2-13               yaml_2.2.1                    rstantools_2.1.1             
[137] htmltools_0.5.2               memoise_2.0.1                 locfit_1.5-9.4                destiny_3.1.1                
[141] viridisLite_0.4.0             digest_0.6.28                 assertthat_0.2.1              mime_0.12                    
[145] rappdirs_0.3.3                SingleR_1.8.0                 RSQLite_2.2.8                 future.apply_1.8.1           
[149] remotes_2.4.1                 data.table_1.14.2             blob_1.2.2                    R.oo_1.24.0                  
[153] splines_4.1.0                 AnnotationHub_3.2.0           RCurl_1.98-1.5                broom_0.7.10                 
[157] hms_1.1.1                     modelr_0.1.8                  colorspace_2.0-2              BiocManager_1.30.16          
[161] ggbeeswarm_0.6.0              shape_1.4.6                   nnet_7.3-16                   Rcpp_1.0.7                   
[165] RANN_2.6.1                    circlize_0.4.13               fansi_0.5.0                   tzdb_0.2.0                   
[169] VIM_6.1.1                     parallelly_1.29.0             R6_2.5.1                      grid_4.1.0                   
[173] ggridges_0.5.3                lifecycle_1.0.1               tradeSeq_1.8.0                bluster_1.4.0                
[177] curl_4.3.2                    leiden_0.3.9                  robustbase_0.93-9             Matrix_1.3-4                 
[181] RcppAnnoy_0.0.19              iterators_1.0.13              htmlwidgets_1.5.4             beachmat_2.10.0              
[185] polyclip_1.10-0               rvest_1.0.2                   ComplexHeatmap_2.10.0         mgcv_1.8-38                  
[189] globals_0.14.0                codetools_0.2-18              lubridate_1.8.0               metapod_1.2.0                
[193] gtools_3.9.2                  prettyunits_1.1.1             dbplyr_2.1.1                  R.methodsS3_1.8.1            
[197] RSpectra_0.16-0               celldex_1.4.0                 gtable_0.3.0                  DBI_1.1.2                    
[201] job_0.3.0                     tensor_1.5                    httr_1.4.2                    KernSmooth_2.23-20           
[205] stringi_1.7.5                 reshape2_1.4.4                viridis_0.6.2                 ggthemes_4.2.4               
[209] hexbin_1.28.2                 xml2_1.3.3                    boot_1.3-28                   BiocNeighbors_1.12.0         
[213] scattermore_0.7               BiocVersion_3.14.0            scran_1.22.1                  DEoptimR_1.0-9               
[217] bit_4.0.4                     spatstat.data_2.1-0           pkgconfig_2.0.3               knn.covertree_1.0            
[221] knitr_1.36 
kstreet13 commented 2 years ago

Hi @stemangiola ,

I see that you closed the issue, did you figure it out? You may be right about the pink cluster, but I also noticed that Slingshot was using different cluster labels (predicted.celltype.l2) than the plot (seurat_clusters), so I thought that might be related. For diagnostic purposes, it can also sometimes help to set asp = 1 in plot, so that orthogonality is preserved (because the curves are based on orthogonal projection).

Best, Kelly

stemangiola commented 2 years ago

Hello,

I think I understood that the trajectory heavily depends on clustering, so if I provide clustering from another network (CITE-seq wnn from Seurat) and I project with Diffusion, this can mess things up. Is this correct?

On this note: do you guys have a way to accept weighted-nn (from seurat) to draw Diffusion map and trajectory?

kstreet13 commented 2 years ago

Just to check, are you saying that the clustering would be based on different data than the dimensionality reduction? I think that could work in theory, but it would seem likely to cause some weird inconsistencies (like you saw with the pink and brown clusters, where they show up in multiple distinct locations).

I'm not particularly familiar with diffusion map methods, so I don't know if there is a way to incorporated a weighted-nn graph. I've only ever used the destiny package, which doesn't make any mention of it that I can find. And Slingshot doesn't use nearest neighbors (unless dist.method = 'mnn'), so it would usually only impact the trajectory through the dimensionality reduction.

stemangiola commented 2 years ago

Thanks,

I realised that methods for trajectory analyses, might need supervision to perform the best, for example if a UMAP topology (e.g. CD4 T cells) is too complicated the result might not be plausible, even if I specify the starting cluster as light orange and the end cluster as dark orange.

Of course a possibility is that I am just doing things wrong.

Please see images below as example

pbmc_CD4_slingshot_UMAP  = 
  pbmc_CD4 %>% 
  RunUMAP(
    nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_", 
    spread    = 0.5,min.dist  = 0.01, n.neighbors = 10L
  ) %>% 
  FindClusters( graph.name = "wsnn", algorithm = 3, resolution = 0.3, verbose = FALSE) %>% 
  as.SingleCellExperiment() %>% 
  slingshot::slingshot(
    clusterLabels = pull(., seurat_clusters), 
    reducedDim = "WNN.UMAP",
    start.clus= '0', end.clus = c("5", "2" )
  )

image

kstreet13 commented 2 years ago

Yeah, something is definitely weird about that Slingshot plot. It looks like there are 8 lineages, which seems like way too many. This indicates that there's probably an issue with the MST. It could help to plot the MST rather than the final trajectory (by setting type = "lin" when plotting the SlingshotDataSet).

Also, from the TSCAN plot, I realized that there are a lot of small clusters that weren't apparent in the first plot (I initially thought there were ~10 clusters, but apparently there are 16?). That, plus the fact that some of the clusters seem to be split across multiple locations on the UMAP plot, is probably going to confuse any method that relies on a cluster-based MST (like Slingshot and TSCAN). For example, there is a vertex indicating a cluster center around (0.9, -0.3), but there are basically no cells there, so that vertex likely corresponds to a cluster that has been split. This is unfortunately a fairly common issue with Louvain clustering + UMAP dimensionality reduction.