xinhe-lab / GSFA

R package that performs sparse factor analysis and differential gene expression discovery simultaneously on single-cell CRISPR screening data.
https://xinhe-lab.github.io/GSFA/
MIT License
19 stars 2 forks source link

Error with fit_gsfa_multivar due to large matrix size: ARMA_64BIT_WORD issue #5

Closed YusukeTakeshima closed 1 year ago

YusukeTakeshima commented 1 year ago

Thank you for providing such a valuable package.

I am eager to utilize it with my own data, but I am encountering an error when executing 'fit_gsfa_multivar'. I suspect that the large number of cells in my dataset might be causing the issue.

Here are the specifics:

$ dim(scaled.gene_exp)
[1] 93281 13025
$ dim(G_mat)
[1] 93281   109
$ iter_num
[1] 20
$ fit0 <- fit_gsfa_multivar(Y = scaled.gene_exp, G = G_mat, 
                          K = 50, init.method = "svd",
                          prior_w_s = 50, prior_w_r = 0.2,
                          prior_beta_s = 20, prior_beta_r = 0.2,
                          niter = iter_num, used_niter = iter_num/2,
                          verbose = T, return_samples = T)
Initializing Z and W with SVD.
Error: Mat::init(): requested size is too large; suggest to enable ARMA_64BIT_WORD

I understand from the error that due to the large number of cells, a 32-bit representation might not suffice, and a 64-bit representation is required. I am currently analyzing the data without limiting it to highly variable genes and excluding only low-expressed genes. However, even when I limit the number of genes, I encounter the same error, leading me to believe this isn't the core issue.

Believing that the lack of support for ARMA_64BIT_WORD might be the problem, I tried: ・Modifying the compile options of Armadillo to enable ARMA_64BIT_WORD. ・Modifying the compile options of RcppArmadillo to enable ARMA_64BIT_WORD.

Despite these changes and reinstalling GSFA, I still face the same error. Would you happen to have any suggestions or solutions to this problem? I understand you might be busy, but any guidance would be greatly appreciated.

Thank you in advance.

LifanLiang commented 1 year ago

Thank you for you feedback. I suspect it is because there is not enough system memory to initialize a large matrix during the procedure. Please try to run this function with a subset of cells first, maybe just 1000 cells. If GSFA can run on a mini scale dataset, then it is most likely a memory issue.

YusukeTakeshima commented 1 year ago

Thank you for your reply. I reduced the number of cells from 90,000 to about 50,000 and the function worked fine. I was wondering if it could be expanded by changing the Armadillo settings, but I understand that it is a memory size issue. Thank you for your time.