satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.18k stars 892 forks source link

FindNeighbors have different results for RNA and SCT assays #8980

Open kellyliyichen opened 1 month ago

kellyliyichen commented 1 month ago

Hi,

Based on FindNeighbors, it seems that the function only depends on embeddings, rather than the counts/normalized counts. So I suppose the results should be the same no matter the default assay is RNA or SCT for the same Seurat object using the same embeddings. But I got different clustering results for RNA and SCT using the following code:

## my_srt is my seurat object
## using SCT assay
srt_obj_sct = my_srt
DefaultAssay(object = srt_obj_sct) = 'SCT'
srt_obj_sct = FindNeighbors(srt_obj_sct, reduction="harmony", dims = 1:30)
srt_obj_sct = FindClusters(srt_obj_sct, resolution = 0.8, cluster.name = "SCT_clusters")
srt_obj_sct = RunUMAP(srt_obj_sct, reduction = "harmony", dims = 1:30, verbose = F)

## using RNA assay
srt_obj_rna = my_srt
DefaultAssay(object = srt_obj_rna) = 'RNA'
srt_obj_rna = FindNeighbors(srt_obj_rna, reduction="harmony", dims = 1:30)
srt_obj_rna = FindClusters(srt_obj_rna, resolution = 0.8, cluster.name = "RNA_clusters")
srt_obj_rna = RunUMAP(srt_obj_rna, reduction = "harmony", dims = 1:30, verbose = F)

I noticed that harmony embeddings were computed using assay SCT when I checked with srt_obj_sct@reductions and srt_obj_rna@reductions

Then I also ran

all.equal(Embeddings(srt_obj_sct, reduction = "harmony"), Embeddings(srt_obj_rna, reduction = "harmony"))

And the output was TRUE.

This confuses me a lot as no matter SCT or RNA is the default assay, the harmony embeddings are the same and should give the same clustering results.

Then I tested the case when I explicitly added the same harmony embeddings using the following code:

## my_srt is my seurat object
## save the harmony embeddings from the original object
harmony_embeddings = Embeddings(hep_clean, "harmony")

## using SCT assay
srt_obj_sct = my_srt
DefaultAssay(object = srt_obj_sct) = 'SCT'
srt_obj_sct[["harmony"]] = NULL
srt_obj_sct[["harmony"]] = CreateDimReducObject(embeddings = harmony_embeddings, key = "harmony_", assay = DefaultAssay(srt_obj_sct))
srt_obj_sct = FindNeighbors(srt_obj_sct, reduction="harmony", dims = 1:30)
srt_obj_sct = FindClusters(srt_obj_sct, resolution = 0.8, cluster.name = "SCT_clusters")
srt_obj_sct = RunUMAP(srt_obj_sct, reduction = "harmony", dims = 1:30, verbose = F)

## using RNA assay
srt_obj_rna = my_srt
DefaultAssay(object = srt_obj_rna) = 'RNA'
srt_obj_rna[["harmony"]] = NULL
srt_obj_rna[["harmony"]] = CreateDimReducObject(embeddings = harmony_embeddings, key = "harmony_", assay = DefaultAssay(srt_obj_rna))
srt_obj_rna = FindNeighbors(srt_obj_rna, reduction="harmony", dims = 1:30)
srt_obj_rna = FindClusters(srt_obj_rna, resolution = 0.8, cluster.name = "RNA_clusters")
srt_obj_rna = RunUMAP(srt_obj_rna, reduction = "harmony", dims = 1:30, verbose = F)

Then I got the same clustering results.

It seems that in FindNeighbors, it does something related to the default assay even when reduction and dims are set, which seems contradictory to the documentation. It is quite confusing to me as I thought FindNeighbors only used the embeddings, which is independent on assays once computed.

I am using Seurat v5 and R 4.3.3 and here is output of sessionInfo():

R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: ~/miniconda3/envs/r_43/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8    LC_NUMERIC=C        LC_TIME=C          
 [4] LC_COLLATE=C        LC_MONETARY=C       LC_MESSAGES=C      
 [7] LC_PAPER=C          LC_NAME=C           LC_ADDRESS=C       
[10] LC_TELEPHONE=C      LC_MEASUREMENT=C    LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_5.0.3       SeuratObject_5.0.1 sp_2.1-3          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.3            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.2-9    matrixStats_1.3.0      ggridges_0.5.6        
 [10] compiler_4.3.3         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.1          pkgconfig_2.0.3       
 [16] crayon_1.5.2           fastmap_1.1.1          labeling_0.4.3        
 [19] utf8_1.2.4             promises_1.3.0         purrr_1.0.2           
 [22] jsonlite_1.8.8         goftest_1.2-3          later_1.3.2           
 [25] uuid_1.2-0             spatstat.utils_3.0-4   irlba_2.3.5.1         
 [28] parallel_4.3.3         cluster_2.1.6          R6_2.5.1              
 [31] ica_1.0-3              stringi_1.8.4          RColorBrewer_1.1-3    
 [34] spatstat.data_3.0-4    reticulate_1.35.0      parallelly_1.37.1     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.0.12           
 [40] IRkernel_1.3.2         tensor_1.5             future.apply_1.11.2   
 [43] zoo_1.8-12             base64enc_0.1-3        sctransform_0.4.1     
 [46] httpuv_1.6.15          Matrix_1.6-5           splines_4.3.3         
 [49] igraph_2.0.2           tidyselect_1.2.1       abind_1.4-5           
 [52] spatstat.random_3.2-3  codetools_0.2-20       miniUI_0.1.1.1        
 [55] spatstat.explore_3.2-7 listenv_0.9.1          lattice_0.22-6        
 [58] tibble_3.2.1           plyr_1.8.9             withr_3.0.0           
 [61] shiny_1.8.1.1          ROCR_1.0-11            evaluate_0.23         
 [64] Rtsne_0.17             future_1.33.2          fastDummies_1.7.3     
 [67] survival_3.5-8         polyclip_1.10-6        fitdistrplus_1.1-11   
 [70] pillar_1.9.0           KernSmooth_2.23-22     plotly_4.10.4         
 [73] generics_0.1.3         RcppHNSW_0.6.0         IRdisplay_1.1         
 [76] ggplot2_3.5.1          munsell_0.5.1          scales_1.3.0          
 [79] globals_0.16.3         xtable_1.8-4           glue_1.7.0            
 [82] lazyeval_0.2.2         tools_4.3.3            data.table_1.15.4     
 [85] RSpectra_0.16-1        pbdZMQ_0.3-11          RANN_2.6.1            
 [88] leiden_0.4.3.1         dotCall64_1.1-1        Cairo_1.6-2           
 [91] cowplot_1.1.3          grid_4.3.3             tidyr_1.3.1           
 [94] colorspace_2.1-0       nlme_3.1-164           patchwork_1.2.0       
 [97] repr_1.1.7             cli_3.6.2              spatstat.sparse_3.0-3 
[100] spam_2.10-0            fansi_1.0.6            viridisLite_0.4.2     
[103] dplyr_1.1.4            uwot_0.1.16            gtable_0.3.5          
[106] digest_0.6.35          progressr_0.14.0       ggrepel_0.9.5         
[109] farver_2.1.2           htmlwidgets_1.6.4      htmltools_0.5.8.1     
[112] lifecycle_1.0.4        httr_1.4.7             mime_0.12             
[115] MASS_7.3-60      

Hope someone could help me with this problem. Thank you in advance!

longmanz commented 3 weeks ago

Hi, This is very strange, and even stranger since you obtained the same FindNeighbor results in the second case. You mentioned that the FindNeighbor() "does something related to the default assay even when reduction and dims are set", could you specify which part you are referring to?

kellyliyichen commented 3 weeks ago

Hi, This is very strange, and even stranger since you obtained the same FindNeighbor results in the second case. You mentioned that the FindNeighbor() "does something related to the default assay even when reduction and dims are set", could you specify which part you are referring to?

Maybe I wasn't clear enough. It was just my guess based on the first case. I got different results from FindNeighbor() where the only difference was the default assay.

But since I got the same results of FindNeighbor() in the second case, I am not sure what the problem could be.