single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Combined max depth is above 1M. Potential memory hog! / Recommendations for maxDepth? #122

Closed bbimber closed 3 months ago

bbimber commented 3 months ago

Hello,

I'm processing 10x scRNAseq data using mode 2b. We do not have any non-multiplexed / single-donor data for mode 2b. I am giving it what I think is a fairly vanilla command:

cellsnp-lite -s 1296-2-GEX.bam -b barcodes.csv -O ./cellsnp/ -p 4 --minMAF 0.1 --minCOUNT 100 --gzip --refseq GRCh38.p13_Ensembl.fasta

The job seems to be moving, but it reports many messages about high depth (which i think are from pileup itself):

15 Mar 2024 17:40:08,934 DEBUG:     [I::main] start time: 2024-03-15 17:40:08
15 Mar 2024 17:40:09,098 DEBUG:     [W::check_args] Max depth set to maximum value (2147483647)
15 Mar 2024 17:40:09,106 DEBUG:     [W::check_args] Max pileup set to maximum value (2147483647)
15 Mar 2024 17:40:09,109 DEBUG:     [I::main] global settings after checking:
15 Mar 2024 17:40:09,112 DEBUG:         num of input files = 1
15 Mar 2024 17:40:09,115 DEBUG:         out_dir = /home/exacloud/gscratch/prime-seq/workDir/9e8fcf22-c391-103c-8192-f8f3fc86be31/SequenceO.work/cellsnp
15 Mar 2024 17:40:09,118 DEBUG:         is_out_zip = 1, is_genotype = 0
15 Mar 2024 17:40:09,121 DEBUG:         is_target = 0, num_of_pos = 0
15 Mar 2024 17:40:09,124 DEBUG:         num_of_barcodes = 18650, num_of_samples = 0
15 Mar 2024 17:40:09,127 DEBUG:         refseq file = /home/exacloud/gscratch/prime-seq/cachedGenomes/129/129_Human_GRCh38.p13_Ensembl.fasta
15 Mar 2024 17:40:09,131 DEBUG:         22 chroms: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
15 Mar 2024 17:40:09,134 DEBUG:         cell-tag = CB, umi-tag = UB
15 Mar 2024 17:40:09,138 DEBUG:         nthreads = 4, tp_max_open = 131072
15 Mar 2024 17:40:09,141 DEBUG:         mthreads = 4, tp_errno = 0, tp_ntry = 0
15 Mar 2024 17:40:09,148 DEBUG:         min_count = 100, min_maf = 0.10, doublet_gl = 0
15 Mar 2024 17:40:09,151 DEBUG:         min_len = 30, min_mapq = 20
15 Mar 2024 17:40:09,158 DEBUG:         rflag_filter = 772, rflag_require = 0
15 Mar 2024 17:40:09,162 DEBUG:         max_depth = 2147483647, max_pileup = 2147483647, no_orphan = 1
15 Mar 2024 17:40:09,167 DEBUG:     [I::main] mode 2a: pileup 22 whole chromosomes in 18650 single cells.
15 Mar 2024 17:40:09,750 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
15 Mar 2024 17:40:09,753 DEBUG:     [I::csp_pileup_core][Thread-0] processing chrom 1 ...
15 Mar 2024 17:40:09,761 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
15 Mar 2024 17:40:09,764 DEBUG:     [I::csp_pileup_core][Thread-1] processing chrom 2 ...
15 Mar 2024 17:40:09,767 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
15 Mar 2024 17:40:09,771 DEBUG:     [I::csp_pileup_core][Thread-2] processing chrom 3 ...
15 Mar 2024 17:40:09,779 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
15 Mar 2024 17:40:09,787 DEBUG:     [I::csp_pileup_core][Thread-3] processing chrom 4 ...
18 Mar 2024 06:05:40,322 DEBUG:     [I::csp_pileup_core][Thread-3] has pileup-ed in total 1559 SNPs for chrom 4
18 Mar 2024 06:05:40,616 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
18 Mar 2024 06:05:40,624 DEBUG:     [I::csp_pileup_core][Thread-4] processing chrom 5 ...
19 Mar 2024 06:18:01,343 DEBUG:     [I::csp_pileup_core][Thread-2] has pileup-ed in total 2865 SNPs for chrom 3
19 Mar 2024 06:18:01,663 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
19 Mar 2024 06:18:01,667 DEBUG:     [I::csp_pileup_core][Thread-5] processing chrom 6 ...
20 Mar 2024 13:20:09,779 DEBUG:     [I::csp_pileup_core][Thread-1] has pileup-ed in total 3808 SNPs for chrom 2
20 Mar 2024 13:20:10,340 DEBUG:     [W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
20 Mar 2024 13:20:10,411 DEBUG:     [I::csp_pileup_core][Thread-6] processing chrom 7 ...

I have a couple of questions:

Thanks for any help.

hxj5 commented 3 months ago

Hi, thanks for the feedback.

  1. If your downstream task is donor deconvolution (e.g., with vireo), then cellsnp-lite mode 1a is recommended. You can download the required SNP list at this folder. Please select the genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz file, here I assume the genome version is hg38.
  2. As to the warning messages about "potential memory hog", you can ignore them as in most cases the memory usage is quite limited in mode 2b. The default setting for --maxDepth is the highest possible value (currently INT_MAX), while we have little experience about how to set a reasonable value, which should be sample/BAM-specific. We would suggest using the default value in this case.
bbimber commented 3 months ago

ok, thank you