r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
301 stars 125 forks source link

Memory Issue with small bins / merge individual snap objects in R? #40

Open jogiles opened 5 years ago

jogiles commented 5 years ago

Hello!

This is a really great approach to scATAC. Any help would be greatly appreciated!

I have several biologically similar samples. The ability to resolve the individual samples in umap and tsne improves when using smaller bins (10k, 5k, 500bp). However, 500bp bin cannot separate the most similar samples, so we would like to use an even small bin size but are running into memory issues (I think). Even when I used 1.5TB of memory, I got the following error while running addBmatToSnap: Epoch: reading cell-bin count matrix session ... Error in .rbind2Csp(x, y) : Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92 Calls: addBmatToSnap ... rbind -> rbind2 -> rbind2 -> rbind2sparse -> .rbind2Csp Execution halted

1) Do you think this is a memory issue? Or something else?

2) Would it be easier/possible to make individual snap objects in R then merge them?

Thank you! Josephine

jogiles commented 5 years ago

Update: I was able to create snap objects in R and add the bins (addBmatToSnap) with bin size of 50 with the individual samples. Is there a way to combine these, so the samples can be analyzed together?

r3fang commented 5 years ago

Hi you can combine different snap object use snapRbind function. Sorry for the delay

Sent from my iPhone

On May 25, 2019, at 8:46 AM, jogiles notifications@github.com wrote:

Update: I was able to create snap objects in R and add the bins (addBmatToSnap) with bin size of 50 with the individual samples. Is there a way to combine these, so the samples can be analyzed together?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

r3fang commented 5 years ago

Let me know if it works or not

jogiles commented 5 years ago

I was able to combine 5 (2 at a time) using snapRbind, but got the same error message when trying to add the 6th sample. I have 8 total.

Error in .rbind2Csp(x, y) : Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92

Any help would be greatly appreciated. Thanks!

r3fang commented 5 years ago

this is because the matrix is too large. If you use 50 as bin size which means the whole matrix with 8 samples is about 80000 (cells) x 50,000,000 I am not surprise to see you are out of memory. My advice is not to use such small bin size.

jogiles commented 5 years ago

More detail: I did this interactively on a LSF cluster with 1.5TB of memory. I didn't get kicked out of my session, so I don't think I maxed out the available memory.

r3fang commented 5 years ago
Error in .rbind2Csp(x, y) :
Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 92

This is an error report from R package Matrix when you combine two large matrix. Also, LSF cluster with 1.5TB of memory does not necessarily mean each node/processor have 1.5TB memory or R has access to the total memory.

mahmoudibrahim commented 5 years ago

thank you for this really helpful tool.

I also ran into a similar memory issue with "addBmatToSnap" using small bin sizes. would you consider adding an option to restrict reading the matrix to bins from a specific chromosome?

this would reduce memory requirements drastically by at least enabling some form of chromosome-wise analysis to filter low-quality cells, remove sparsely captured bins and so on.

r3fang commented 5 years ago

Hello, i am also aware of this issue, the easiest way to solve this problem is to create the snap object from a customized matrix. For example

# M is the cell-by-bin matrix
# bin.gr is a genomic Ranges object that contains the coordinates of bins from specific chrom
# barcodes is a array of characters that contains the unique barcode for each cell
x.sp = newSnap();
x.sp@bmat = M;
x.sp@feature = bin.gr;
x.sp@barcode = barcodes;

Does this help? -Rongxin

r3fang commented 5 years ago

Another way to solve this problem is that when you create the snap file, try to create multiple snap files. For instance, you can split bam or bed file into each chromsome and create multiple snap files separately. Or you can split the file based on barcodes and create multiple snap files each containing small set of barcodes

mahmoudibrahim commented 5 years ago

hi

thanks for the suggestions, I will split the chromosomes when I'm creating the snap file best mahmoud

On Sun, Aug 18, 2019, 22:49 Rongxin Fang notifications@github.com wrote:

Another way to solve this problem is that when you create the snap file, try to create multiple snap files. For instance, you can split bam or bed file into each chromsome and create multiple snap files separately. Or you can split the file based on barcodes and create multiple snap files each containing small set of barcodes

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/40?email_source=notifications&email_token=ACHAGRIOW4Y5HB6FOAW4OLTQFGYWJA5CNFSM4HPTRY22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RH4ZY#issuecomment-522354279, or mute the thread https://github.com/notifications/unsubscribe-auth/ACHAGRMNXQ3Y7KMARBTQM4TQFGYWJANCNFSM4HPTRY2Q .