quon-titative-biology / scBFA

Binary Factor Analysis: a dimensionality reduction tool for noisy, high throughput single cell genomic data
Other
2 stars 2 forks source link

Memory issue #2

Open lzj1769 opened 4 years ago

lzj1769 commented 4 years ago

Hi,

I got this error message when I use scBFA for our dataset:

Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: scBFA ... getGeneExpr -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted

Can you help me with this?

Best, Zhijian

gquon commented 4 years ago

Hi, can you give more details on the size of the dataset (number of cells, rows)? Any chance you could provide an anonymized version of your dataset? (you could delete the row and column names if you want).

lzj1769 commented 4 years ago

Hi,

I upload my data here: https://drive.google.com/file/d/1ESa7pcD6_uzZkrgWUeWxGelOtD2Kfsj-/view?usp=sharing

and below is my code to run scBFA:

counts <- readRDS("scATAC.Rds")
x <- CreateSeuratObject(counts = counts)
bfa_model <- scBFA(scData = x, numFactors = 30, method = "CG")
zz <- as.matrix(t(bfa_model$ZZ))
colnames(zz) <- colnames(counts)
write.table(zz, file = "./scBFA.txt", quote = FALSE, sep = "\t")

Thanks, Li

RuoxinLi commented 4 years ago

Hi,

We are currently still working on the memory issue, as the number of sites in your raw data (>200,000) is quite large. We are now testing a mini-batch optimization procedure, it might take us 2~3 more weeks to get things working properly. It's worth to note that these visualizations typically do some feature selection (analogous to how highly variable genes are typically selected in scRNA-seq analyses before visualization). We've noticed many of the sites are open in very few cells (<1%). Filtering to keep only sites that are open in at least 2-3% of cells will help a lot with memory.

Best