satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

Sparse to dense coercion when running merge on two Seurat objects. #9125

Open joshuak94 opened 3 months ago

joshuak94 commented 3 months ago

I was trying to create a reproducible example of another issue I'm having with JoinLayers() taking an indefinite amount of time (killed manually after ~12 hours).

The dataset I used is from here, I used the gene_count_cleaned_sampled_100k.rds file along with the cell_annotation.csv file for metadata.

I split the gene matrix into two groups: E11.5 cells and E13.5 cells. When merging, I get the following warnings, and then eventually an error:

Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 6.6 GiB”
Warning message in asMethod(object):
“sparse->dense coercion: allocating vector of size 3.6 GiB”

Error: cannot allocate vector of size 5.1 Gb
Traceback:

1. merge(data_115, data_135, add.cell.ids = c("115", "135"))
2. merge(data_115, data_135, add.cell.ids = c("115", "135"))
3. merge.default(data_115, data_135, add.cell.ids = c("115", "135"))
4. merge(as.data.frame(x), as.data.frame(y), ...)
5. merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
6. cbind(x[ij[, 1L], , drop = FALSE], y[ij[, 2L], , drop = FALSE])
7. x[ij[, 1L], , drop = FALSE]
8. `[.data.frame`(x, ij[, 1L], , drop = FALSE)

My memory usage also skyrockets to 400+ GB.

Source code:

library(Seurat)

data = readRDS("/project/moca/gene_count_cleaned_sampled_100k.RDS")
metadata = read.csv("/project/moca/cell_annotate.csv")
rownames(data) = gsub("\\.\\d+$", "", rownames(data))

metadata_subset115 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 11.5), ]
metadata_subset135 = metadata[which(metadata$sample %in% colnames(data) & metadata$development_stage == 13.5), ]

data_115 = data[, which(colnames(data) %in% metadata_subset115$sample)]
data_seurat_115 = CreateSeuratObject(data_115, meta.data = metadata_subset115)

data_135 = data[, which(colnames(data) %in% metadata_subset135$sample)]
data_seurat_135 = CreateSeuratObject(data_135, meta.data = metadata_subset135)

merged_data = merge(data_115, data_135, add.cell.ids=c("115", "135"))

sessionInfo():

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: MarIuX64 2.0 GNU/Linux

Matrix products: default
BLAS:   /pkg/R-4.4.0-0/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/liblapack.so.3.10.1

locale:
 [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
 [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
[10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_5.1.0       SeuratObject_5.0.2 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.4            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.3-2    matrixStats_1.3.0      ggridges_0.5.6        
 [10] compiler_4.4.0         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.1          pkgconfig_2.0.3       
 [16] crayon_1.5.3           fastmap_1.2.0          utf8_1.2.4            
 [19] promises_1.3.0         purrr_1.0.2            jsonlite_1.8.8        
 [22] goftest_1.2-3          later_1.3.2            uuid_1.1-1            
 [25] spatstat.utils_3.0-5   irlba_2.3.5.1          parallel_4.4.0        
 [28] cluster_2.1.6          R6_2.5.1               ica_1.0-3             
 [31] stringi_1.8.4          RColorBrewer_1.1-3     spatstat.data_3.1-2   
 [34] reticulate_1.38.0      spatstat.univar_3.0-0  parallelly_1.37.1     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.0.12           
 [40] IRkernel_1.3.2         tensor_1.5             future.apply_1.11.2   
 [43] zoo_1.8-12             base64enc_0.1-3        sctransform_0.4.1     
 [46] httpuv_1.6.15          Matrix_1.7-0           splines_4.4.0         
 [49] igraph_2.0.3           tidyselect_1.2.1       abind_1.4-5           
 [52] spatstat.random_3.3-1  codetools_0.2-20       miniUI_0.1.1.1        
 [55] spatstat.explore_3.3-1 listenv_0.9.1          lattice_0.22-6        
 [58] tibble_3.2.1           plyr_1.8.9             shiny_1.8.1.1         
 [61] ROCR_1.0-11            evaluate_0.24.0        Rtsne_0.17            
 [64] future_1.33.2          fastDummies_1.7.3      survival_3.5-8        
 [67] polyclip_1.10-6        fitdistrplus_1.2-1     pillar_1.9.0          
 [70] KernSmooth_2.23-22     plotly_4.10.4          generics_0.1.3        
 [73] RcppHNSW_0.6.0         IRdisplay_1.1          ggplot2_3.5.1         
 [76] munsell_0.5.1          scales_1.3.0           globals_0.16.3        
 [79] xtable_1.8-4           glue_1.7.0             lazyeval_0.2.2        
 [82] tools_4.4.0            data.table_1.15.4      RSpectra_0.16-1       
 [85] pbdZMQ_0.3-10          RANN_2.6.1             leiden_0.4.3.1        
 [88] dotCall64_1.1-1        cowplot_1.1.3          grid_4.4.0            
 [91] tidyr_1.3.1            colorspace_2.1-0       nlme_3.1-164          
 [94] patchwork_1.2.0        repr_1.1.6             cli_3.6.3             
 [97] spatstat.sparse_3.1-0  spam_2.10-0            fansi_1.0.6           
[100] viridisLite_0.4.2      dplyr_1.1.4            uwot_0.2.2            
[103] gtable_0.3.5           digest_0.6.36          progressr_0.14.0      
[106] ggrepel_0.9.5          htmlwidgets_1.6.4      htmltools_0.5.8.1     
[109] lifecycle_1.0.4        httr_1.4.7             mime_0.12             
[112] MASS_7.3-60.2         
rsatija commented 3 months ago

Thank you for sending this, which is very helpful for us to debug.

Can you check if the rownames of your metadata matches the column names of your object? i.e., all(rownames(object@meta.data)==colnames(object)) if your object is called object?

This relates to https://github.com/satijalab/seurat/issues/9125

Let us know , and we will take a look early next week and get back to you ASAP.

joshuak94 commented 3 months ago
all(rownames(data_seurat_115@meta.data)==colnames(data_seurat_115)) 
all(rownames(data_seurat_135@meta.data)==colnames(data_seurat_135)) 

Both yield TRUE.

joshuak94 commented 2 months ago

Hi @rsatija, I was wondering if there was an update regarding this issue?

xlucpu commented 2 weeks ago

same issue here but for Xenium data, No idea why and how to resolve it.

xenium.obj <- SCTransform(xenium.obj, assay = "Xenium") Running SCTransform on assay: Xenium Running SCTransform on layer: counts vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes. Variance stabilizing transformation of count matrix of size 377 by 376392 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 376 genes, 5000 cells Found 2 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 377 genes Error in asMethod(object) : (converted from warning) sparse->dense coercion: allocating vector of size 1.1 GiB