satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 902 forks source link

Sparse to dense coercion when running merge on two Seurat objects. #9125

Open joshuak94 opened 1 month ago

joshuak94 commented 1 month ago

I was trying to create a reproducible example of another issue I'm having with JoinLayers() taking an indefinite amount of time (killed manually after ~12 hours).

The dataset I used is from here, I used the gene_count_cleaned_sampled_100k.rds file along with the cell_annotation.csv file for metadata.

I split the gene matrix into two groups: E11.5 cells and E13.5 cells. When merging, I get the following warnings, and then eventually an error:

Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 6.6 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 3.6 GiB”

Error: cannot allocate vector of size 5.1 Gb
Traceback:

1. merge(data_115, data_135, add.cell.ids = c("115", "135"))
2. merge(data_115, data_135, add.cell.ids = c("115", "135"))
3. merge.default(data_115, data_135, add.cell.ids = c("115", "135"))
4. merge(as.data.frame(x), as.data.frame(y), ...)
5. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
6. cbind(x[ij[, 1L], , drop = FALSE], y[ij[, 2L], , drop = FALSE])
7. x[ij[, 1L], , drop = FALSE]
8. `[.data.frame`(x, ij[, 1L], , drop = FALSE)

My memory usage also skyrockets to 400+ GB.

Source code:

library(Seurat)

data = readRDS("/project/moca/gene_count_cleaned_sampled_100k.RDS")
metadata = read.csv("/project/moca/cell_annotate.csv")
rownames(data) = gsub("\\.\\d+$", "", rownames(data))

metadata_subset115 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 11.5), ]
metadata_subset135 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 13.5), ]

data_115 = data[, which(colnames(data) %in% metadata_subset115$sample)]
data_seurat_115 = CreateSeuratObject(data_115, meta.data = metadata_subset115)

data_135 = data[, which(colnames(data) %in% metadata_subset135$sample)]
data_seurat_135 = CreateSeuratObject(data_135, meta.data = metadata_subset135)

merged_data = merge(data_115, data_135, add.cell.ids=c("115", "135"))

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: MarIuX64 2.0 GNU/Linux

Matrix products: default
BLAS:   /pkg/R-4.4.0-0/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/liblapack.so.3.10.1

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_5.1.0       SeuratObject_5.0.2 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.4            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.3-2    matrixStats_1.3.0      ggridges_0.5.6        
 [10] compiler_4.4.0         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.1          pkgconfig_2.0.3       
 [16] crayon_1.5.3           fastmap_1.2.0          utf8_1.2.4            
 [19] promises_1.3.0         purrr_1.0.2            jsonlite_1.8.8        
 [22] goftest_1.2-3          later_1.3.2            uuid_1.1-1            
 [25] spatstat.utils_3.0-5   irlba_2.3.5.1          parallel_4.4.0        
 [28] cluster_2.1.6          R6_2.5.1               ica_1.0-3             
 [31] stringi_1.8.4          RColorBrewer_1.1-3     spatstat.data_3.1-2   
 [34] reticulate_1.38.0      spatstat.univar_3.0-0  parallelly_1.37.1     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.0.12           
 [40] IRkernel_1.3.2         tensor_1.5             future.apply_1.11.2   
 [43] zoo_1.8-12             base64enc_0.1-3        sctransform_0.4.1     
 [46] httpuv_1.6.15          Matrix_1.7-0           splines_4.4.0         
 [49] igraph_2.0.3           tidyselect_1.2.1       abind_1.4-5           
 [52] spatstat.random_3.3-1  codetools_0.2-20       miniUI_0.1.1.1        
 [55] spatstat.explore_3.3-1 listenv_0.9.1          lattice_0.22-6        
 [58] tibble_3.2.1           plyr_1.8.9             shiny_1.8.1.1         
 [61] ROCR_1.0-11            evaluate_0.24.0        Rtsne_0.17            
 [64] future_1.33.2          fastDummies_1.7.3      survival_3.5-8        
 [67] polyclip_1.10-6        fitdistrplus_1.2-1     pillar_1.9.0          
 [70] KernSmooth_2.23-22     plotly_4.10.4          generics_0.1.3        
 [73] RcppHNSW_0.6.0         IRdisplay_1.1          ggplot2_3.5.1         
 [76] munsell_0.5.1          scales_1.3.0           globals_0.16.3        
 [79] xtable_1.8-4           glue_1.7.0             lazyeval_0.2.2        
 [82] tools_4.4.0            data.table_1.15.4      RSpectra_0.16-1       
 [85] pbdZMQ_0.3-10          RANN_2.6.1             leiden_0.4.3.1        
 [88] dotCall64_1.1-1        cowplot_1.1.3          grid_4.4.0            
 [91] tidyr_1.3.1            colorspace_2.1-0       nlme_3.1-164          
 [94] patchwork_1.2.0        repr_1.1.6             cli_3.6.3             
 [97] spatstat.sparse_3.1-0  spam_2.10-0            fansi_1.0.6           
[100] viridisLite_0.4.2      dplyr_1.1.4            uwot_0.2.2            
[103] gtable_0.3.5           digest_0.6.36          progressr_0.14.0      
[106] ggrepel_0.9.5          htmlwidgets_1.6.4      htmltools_0.5.8.1     
[109] lifecycle_1.0.4        httr_1.4.7             mime_0.12             
[112] MASS_7.3-60.2         
rsatija commented 1 month ago

Thank you for sending this, which is very helpful for us to debug.

Can you check if the rownames of your metadata matches the column names of your object? i.e., all(rownames(object@meta.data)==colnames(object)) if your object is called object?

This relates to https://github.com/satijalab/seurat/issues/9125

Let us know , and we will take a look early next week and get back to you ASAP.

joshuak94 commented 1 month ago
all(rownames(data_seurat_115@meta.data)==colnames(data_seurat_115)) 
all(rownames(data_seurat_135@meta.data)==colnames(data_seurat_135)) 

Both yield TRUE.

joshuak94 commented 1 month ago

Hi @rsatija, I was wondering if there was an update regarding this issue?