stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
329 stars 88 forks source link

Running out of memory in create_signac #309

Closed rargelaguet closed 4 years ago

rargelaguet commented 4 years ago

Hi Tim, I am trying to run CreateChromatinAssay on an ATAC matrix of dimensions (305187,23838), which should be at most 305187 238388 / 1e9= 58GB in a non-sparse matrix format. I am however running out of memory despite requesting 100GB. Is this expected or am I doing something wrong? Thanks!

Here is the formatted code:

# Load matrix, features and barcodes
# 305187 23838 108569464  ((305187*23838*8) / 1e9= 58GB in standard matrix format)
m <- Matrix::readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]

# Define GRanges object using the features
granges <- StringToGRanges(features, sep = c("_", "_"))
granges <- granges[as.vector(seqnames(granges) %in% standardChromosomes(granges)),]

# Define Granges object with gene annotations from ENSEMBL
ensembl.annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(ensembl.annotations) <- 'UCSC'
genome(ensembl.annotations) <- "mm10"

# Create Seurat Chromatin Assay
chrom_assay <- CreateChromatinAssay(
  counts = m,
  # sep = c("_", "_"),
  ranges = granges,
  genome = 'mm10',
  fragments = NULL,
  min.cells = 0,
  min.features = 0,
  annotation = ensembl.annotations
)

and the sessionInfo():

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ForceAtlas2_0.1            igraph_1.2.6               EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.13.1           AnnotationFilter_1.13.0   
 [6] GenomicFeatures_1.41.3     AnnotationDbi_1.51.3       Biobase_2.49.1             GenomicRanges_1.41.6       GenomeInfoDb_1.25.11      
[11] IRanges_2.23.10            S4Vectors_0.27.14          BiocGenerics_0.35.4        Signac_1.0.0               Seurat_3.9.9.9003         
[16] Matrix_1.2-18              purrr_0.3.4                data.table_1.13.2          ggpubr_0.4.0               ggplot2_3.3.2             

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1              SnowballC_0.7.0             rtracklayer_1.49.5          GGally_2.0.0               
  [5] R.methodsS3_1.8.1           tidyr_1.1.2                 bit64_4.0.5                 knitr_1.30                 
  [9] irlba_2.3.3                 DelayedArray_0.15.16        R.utils_2.10.1              rpart_4.1-15               
 [13] RCurl_1.98-1.2              generics_0.0.2              callr_3.5.1                 cowplot_1.1.0              
 [17] usethis_1.6.3               RSQLite_2.2.1               RANN_2.6.1                  future_1.19.1              
 [21] bit_4.0.4                   spatstat.data_1.4-3         xml2_1.3.2                  httpuv_1.5.4               
 [25] SummarizedExperiment_1.19.9 assertthat_0.2.1            xfun_0.18                   hms_0.5.3                  
 [29] promises_1.1.1              fansi_0.4.1                 progress_1.2.2              dbplyr_1.4.4               
 [33] readxl_1.3.1                DBI_1.1.0                   htmlwidgets_1.5.2           reshape_0.8.8              
 [37] ellipsis_0.3.1              corrplot_0.84               dplyr_1.0.2                 backports_1.1.10           
 [41] biomaRt_2.45.9              deldir_0.1-29               MatrixGenerics_1.1.8        vctrs_0.3.4                
 [45] remotes_2.2.0               ROCR_1.0-11                 abind_1.4-5                 withr_2.3.0                
 [49] BSgenome_1.57.7             checkmate_2.0.0             sctransform_0.3.1           GenomicAlignments_1.25.3   
 [53] prettyunits_1.1.1           goftest_1.2-2               cluster_2.1.0               lazyeval_0.2.2             
 [57] crayon_1.3.4                pkgconfig_2.0.3             pkgload_1.1.0               nlme_3.1-149               
 [61] ProtGenerics_1.21.0         devtools_2.3.2              nnet_7.3-14                 rlang_0.4.8                
 [65] globals_0.13.1              lifecycle_0.2.0             miniUI_0.1.1.1              BiocFileCache_1.13.1       
 [69] rsvd_1.0.3                  dichromat_2.0-0             rprojroot_1.3-2             cellranger_1.1.0           
 [73] polyclip_1.10-0             matrixStats_0.57.0          lmtest_0.9-38               graph_1.67.1               
 [77] ggseqlogo_0.1               carData_3.0-4               zoo_1.8-8                   base64enc_0.1-3            
 [81] processx_3.4.4              ggridges_0.5.2              GlobalOptions_0.1.2         pheatmap_1.0.12            
 [85] png_0.1-7                   viridisLite_0.3.0           rjson_0.2.20                bitops_1.0-6               
 [89] R.oo_1.24.0                 KernSmooth_2.23-17          Biostrings_2.57.2           blob_1.2.1                 
 [93] shape_1.4.5                 stringr_1.4.0               qvalue_2.21.0               jpeg_0.1-8.1               
 [97] rstatix_0.6.0               ggsignif_0.6.0              scales_1.1.1                memoise_1.1.0              
[101] magrittr_1.5                plyr_1.8.6                  ica_1.0-2                   zlibbioc_1.35.0            
[105] compiler_4.0.3              RColorBrewer_1.1-2          clue_0.3-57                 fitdistrplus_1.1-1         
[109] cli_2.1.0                   Rsamtools_2.5.3             XVector_0.29.3              listenv_0.8.0              
[113] ps_1.4.0                    patchwork_1.0.1             pbapply_1.4-3               htmlTable_2.1.0            
[117] Formula_1.2-3               MASS_7.3-53                 mgcv_1.8-33                 tidyselect_1.1.0           
[121] stringi_1.5.3               forcats_0.5.0               askpass_1.1                 latticeExtra_0.6-29        
[125] ggrepel_0.8.2               grid_4.0.3                  VariantAnnotation_1.35.4    fastmatch_1.1-0            
[129] tools_4.0.3                 future.apply_1.6.0          rio_0.5.16                  circlize_0.4.10            
[133] rstudioapi_0.11             foreign_0.8-80              lsa_0.73.2                  gridExtra_2.3              
[137] Rtsne_0.15                  digest_0.6.27               BiocManager_1.30.10         shiny_1.5.0                
[141] Rcpp_1.0.5                  car_3.0-10                  broom_0.7.1                 later_1.1.0.1              
[145] RcppAnnoy_0.0.16            OrganismDbi_1.31.1          httr_1.4.2                  ggbio_1.37.1               
[149] biovizBase_1.37.0           ComplexHeatmap_2.5.6        colorspace_1.4-1            fs_1.5.0                   
[153] XML_3.99-0.5                tensor_1.5                  reticulate_1.16             splines_4.0.3              
[157] uwot_0.1.8.9001             RBGL_1.65.0                 RcppRoll_0.3.0              spatstat.utils_1.17-0      
[161] sessioninfo_1.1.1           plotly_4.9.2.1              xtable_1.8-4                jsonlite_1.7.1             
[165] spatstat_1.64-1             testthat_2.3.2              R6_2.4.1                    Hmisc_4.4-1                
[169] pillar_1.4.6                htmltools_0.5.0             mime_0.9                    glue_1.4.2                 
[173] fastmap_1.0.1               BiocParallel_1.23.3         codetools_0.2-16            pkgbuild_1.1.0             
[177] lattice_0.20-41             tibble_3.0.4                curl_4.3                    leiden_0.3.3               
[181] zip_2.1.1                   openxlsx_4.2.2              openssl_1.4.3               survival_3.2-7             
[185] desc_1.2.0                  munsell_0.5.0               GetoptLong_1.0.3            GenomeInfoDbData_1.2.4     
[189] haven_2.3.1                 reshape2_1.4.4              gtable_0.3.0    
rargelaguet commented 4 years ago

In case it is relevant, I get the following error when loading the data with Read10X:

# Error in as(object = list_of_data[[j]], Class = "dgCMatrix"): no method or default for coercing "ngTMatrix" to "dgCMatrix"
# inputdata <- Read10X(paste0(io$basedir,"/data"), gene.column = 1)

so I have to load it manually:

m <- Matrix::readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]
timoast commented 4 years ago

Hi Ricard, this might have something to do with the format of the matrix. What if you convert to a dgCMatrix before creating the object? eg:

library(Matrix)
m <- readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]
colnames(m) <- barcodes
rownames(m) <- features
m <- as(object = m, Class = "dgCMatrix")
rargelaguet commented 4 years ago

The format of the sparse matrix was indeed the problem, thanks!

ngTMatrix is a sparse logical matrix, not numeric. To convert it to dgCMatrix I did m <- m*1, as m <- as(object = m, Class = "dgCMatrix") does not work. I will close the issue.