satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

subscript out of bounds when trying to run PrepSCTFindMarkers() #9112

Open ChaptCarole opened 3 months ago

ChaptCarole commented 3 months ago

Hello,

For my analysis, I have 12 samples that I merged before performing the normalization with SCTransform. When pursuing the analysis, I saw that I could use only 3 samples to perform a good annotation of the clusters, because for the other samples, there was an abnormal expression of a lot of genes and it was complicated to conclude. So I splitted the merged seurat object and merge again only the 3 seurat object that were interesting for the annotation. Then, I wanted to check the markers of these clusters using FindAllMarkers(). Since the data was normalized with SCTransform, I had first to perform a PrepSCTFindMarkers(). However, each time I tried to run this, I got the error:

Found 3 SCT models. Recorrecting SCT counts using minimum median counts: 5574 | | 0 % ~calculating Error in .subscript.2ary(x, i, j, drop = TRUE) : subscript out of bounds

Here is my code:

# Load data
merged_data <- base::readRDS("merged_data.rds")

# Split data
obj.list <- Seurat::SplitObject(merged_data, split.by = "orig.ident")

# Merge control datasets
control <- merge(obj.list$control1, y = c(obj.list$control2, obj.list$control3))

# Prepare the data to be run with the function FindAllMarkers
control <- Seurat::PrepSCTFindMarkers(control, assay = "SCT", verbose = TRUE)

Here is my sessionInfo()

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /shared/software/miniconda/envs/r-4.3.1/lib/libopenblasp-r0.3.24.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] loupeR_1.1.0

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22            splines_4.3.1               later_1.3.2                
  [4] BiocIO_1.12.0               bitops_1.0-7                tibble_3.2.1               
  [7] polyclip_1.10-6             XML_3.99-0.17               fastDummies_1.7.3          
 [10] lifecycle_1.0.4             scDblFinder_1.16.0          edgeR_4.0.16               
 [13] hdf5r_1.3.10                globals_0.16.3              lattice_0.22-6             
 [16] MASS_7.3-60.0.1             magrittr_2.0.3              rmarkdown_2.27             
 [19] limma_3.58.1                plotly_4.10.4               yaml_2.3.9                 
 [22] metapod_1.10.1              httpuv_1.6.15               Seurat_5.1.0               
 [25] sctransform_0.4.1           askpass_1.2.0               spam_2.10-0                
 [28] sp_2.1-4                    spatstat.sparse_3.1-0       reticulate_1.38.0          
 [31] cowplot_1.1.3               pbapply_1.7-2               RColorBrewer_1.1-3         
 [34] abind_1.4-5                 zlibbioc_1.48.2             Rtsne_0.17                 
 [37] GenomicRanges_1.54.1        purrr_1.0.2                 BiocGenerics_0.48.1        
 [40] RCurl_1.98-1.14             GenomeInfoDbData_1.2.11     IRanges_2.36.0             
 [43] S4Vectors_0.40.2            ggrepel_0.9.5               irlba_2.3.5.1              
 [46] listenv_0.9.1               spatstat.utils_3.0-5        umap_0.2.10.0              
 [49] goftest_1.2-3               RSpectra_0.16-1             dqrng_0.4.1                
 [52] spatstat.random_3.2-3       fitdistrplus_1.1-11         parallelly_1.37.1          
 [55] DelayedMatrixStats_1.24.0   leiden_0.4.3.1              codetools_0.2-20           
 [58] DelayedArray_0.28.0         scuttle_1.12.0              tidyselect_1.2.1           
 [61] ScaledMatrix_1.10.0         viridis_0.6.5               matrixStats_1.3.0          
 [64] stats4_4.3.1                spatstat.explore_3.2-7      GenomicAlignments_1.38.2   
 [67] jsonlite_1.8.8              BiocNeighbors_1.20.2        progressr_0.14.0           
 [70] ggridges_0.5.6              survival_3.7-0              scater_1.30.1              
 [73] tools_4.3.1                 ica_1.0-3                   Rcpp_1.0.12                
 [76] glue_1.7.0                  gridExtra_2.3               SparseArray_1.2.4          
 [79] xfun_0.45                   MatrixGenerics_1.14.0       GenomeInfoDb_1.38.8        
 [82] dplyr_1.1.4                 fastmap_1.2.0               bluster_1.12.0             
 [85] fansi_1.0.6                 openssl_2.2.0               digest_0.6.36              
 [88] rsvd_1.0.5                  R6_2.5.1                    mime_0.12                  
 [91] colorspace_2.1-0            scattermore_1.2             tensor_1.5                 
 [94] spatstat.data_3.1-2         utf8_1.2.4                  tidyr_1.3.1                
 [97] generics_0.1.3              renv_1.0.7                  data.table_1.15.4          
[100] rtracklayer_1.62.0          httr_1.4.7                  htmlwidgets_1.6.4          
[103] S4Arrays_1.2.1              uwot_0.2.2                  pkgconfig_2.0.3            
[106] gtable_0.3.5                lmtest_0.9-40               SingleCellExperiment_1.24.0
[109] XVector_0.42.0              htmltools_0.5.8.1           dotCall64_1.1-1            
[112] SeuratObject_5.0.2          scales_1.3.0                Biobase_2.62.0             
[115] png_0.1-8                   scran_1.30.2                knitr_1.48                 
[118] rstudioapi_0.16.0           reshape2_1.4.4              rjson_0.2.21               
[121] nlme_3.1-165                zoo_1.8-12                  stringr_1.5.1              
[124] KernSmooth_2.23-24          parallel_4.3.1              miniUI_0.1.1.1             
[127] vipor_0.4.7                 restfulr_0.0.15             pillar_1.9.0               
[130] grid_4.3.1                  vctrs_0.6.5                 RANN_2.6.1                 
[133] promises_1.3.0              BiocSingular_1.18.0         beachmat_2.18.1            
[136] xtable_1.8-4                cluster_2.1.6               beeswarm_0.4.0             
[139] evaluate_0.24.0             locfit_1.5-9.10             cli_3.6.3                  
[142] compiler_4.3.1              Rsamtools_2.18.0            rlang_1.1.4                
[145] crayon_1.5.3                future.apply_1.11.2         plyr_1.8.9                 
[148] ggbeeswarm_0.7.2            stringi_1.8.4               viridisLite_0.4.2          
[151] deldir_2.0-4                BiocParallel_1.36.0         munsell_0.5.1              
[154] Biostrings_2.70.3           lazyeval_0.2.2              spatstat.geom_3.2-9        
[157] Matrix_1.6-5                RcppHNSW_0.6.0              patchwork_1.2.0            
[160] bit64_4.0.5                 sparseMatrixStats_1.14.0    future_1.33.2              
[163] ggplot2_3.5.1               statmod_1.5.0               shiny_1.8.1.1              
[166] SummarizedExperiment_1.32.0 ROCR_1.0-11                 igraph_2.0.3               
[169] bit_4.0.5                   xgboost_1.7.7.1

I suspect that splitting then merging some samples without the others is the problem. So, is there any way to do this correctly ?

Thanks a lot for your help,

longmanz commented 3 months ago

Hi, This is a bug and have been reported at https://github.com/satijalab/seurat/issues/8561. Our team is trying to fix it. In the meantime, a workaround would be running FindMarkers() using the "RNA" assay's data slot instead of using SCT assay. In most cases you should get similar results especially if you are searching for cluster markers.