Running out of memory in create_signac #309

Closed rargelaguet closed 4 years ago

rargelaguet commented 4 years ago

Hi Tim, I am trying to run CreateChromatinAssay on an ATAC matrix of dimensions (305187,23838), which should be at most 305187 238388 / 1e9= 58GB in a non-sparse matrix format. I am however running out of memory despite requesting 100GB. Is this expected or am I doing something wrong? Thanks!

Here is the formatted code:

# Load matrix, features and barcodes
# 305187 23838 108569464  ((305187*23838*8) / 1e9= 58GB in standard matrix format)
m <- Matrix::readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]

# Define GRanges object using the features
granges <- StringToGRanges(features, sep = c("_", "_"))
granges <- granges[as.vector(seqnames(granges) %in% standardChromosomes(granges)),]

# Define Granges object with gene annotations from ENSEMBL
ensembl.annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(ensembl.annotations) <- 'UCSC'
genome(ensembl.annotations) <- "mm10"

# Create Seurat Chromatin Assay
chrom_assay <- CreateChromatinAssay(
  counts = m,
  # sep = c("_", "_"),
  ranges = granges,
  genome = 'mm10',
  fragments = NULL,
  min.cells = 0,
  min.features = 0,
  annotation = ensembl.annotations

and the sessionInfo():

rargelaguet commented 4 years ago

In case it is relevant, I get the following error when loading the data with Read10X:

# Error in as(object = list_of_data[[j]], Class = "dgCMatrix"): no method or default for coercing "ngTMatrix" to "dgCMatrix"
# inputdata <- Read10X(paste0(io$basedir,"/data"), gene.column = 1)

so I have to load it manually:

m <- Matrix::readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]
timoast commented 4 years ago

Hi Ricard, this might have something to do with the format of the matrix. What if you convert to a dgCMatrix before creating the object? eg:

m <- readMM(io$matrix)
barcodes <- fread(io$barcodes, header=F)[[1]]
features <- fread(io$features, header=F)[[1]]
colnames(m) <- barcodes
rownames(m) <- features
m <- as(object = m, Class = "dgCMatrix")
rargelaguet commented 4 years ago

The format of the sparse matrix was indeed the problem, thanks!

ngTMatrix is a sparse logical matrix, not numeric. To convert it to dgCMatrix I did m <- m*1, as m <- as(object = m, Class = "dgCMatrix") does not work. I will close the issue.