single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Huge 10X scRNA-seq mouse data #70

Open alexfernandes8a opened 1 year ago

alexfernandes8a commented 1 year ago

Hi! I have a huge 10X scRNA-seq mouse data (~60Gb BAM file | ~50K cells from 12 mice) that I am trying to run on cellSNP-lite. I compiled cellSNP-lite in an HPC environment and I am running it from there on the mode 2A. The problem is, no matter how much RAM I am using, I am constantly getting the message "Combined max depth is above 1M. Potential memory hog!" and it has been running for 11 days already. I know it is a lot of data and I am wondering what would be the best approach in that scenario? Perhaps split the cell barcodes file? Any help is highly appreciated! Thank you so very much.

hxj5 commented 1 year ago

Hi, Mode 2a is more suitable for small datasets. For large datasets, you may try Mode 2b + Mode 1a. Mode 2a does joint calling and genotyping, but it is substantially slower than calling first in a bulk manner by Mode 2b followed by genotyping in Mode 1a. To speed up, you may try --minMAF 0.1 --minCOUNT 100 options in both modes.