single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

has pileup-ed in total 0 SNPs #44

Closed haigdjambazian closed 2 years ago

haigdjambazian commented 2 years ago

I ran my dataset with two settings:

cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -p $PROC --minCOUNT 10 cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -p $PROC --minCOUNT 2 --countORPHAN

In both cases I get 0 SNP piled up.

This is the log so far for the test with countORPHAN above:

[I::main] start time: 2022-05-31 08:28:36
[W::check_args] Max depth set to maximum value (2147483647)
[I::main] mode 2a: pileup 20 whole chromosomes in 737280 single cells.
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-4] processing chrom 5 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-11] processing chrom 12 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-2] processing chrom 3 ...
[I::csp_pileup_core][Thread-3] processing chrom 4 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-1] processing chrom 2 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-7] processing chrom 8 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-0] processing chrom 1 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-8] processing chrom 9 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-9] processing chrom 10 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-13] processing chrom 14 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-10] processing chrom 11 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-14] processing chrom 15 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-5] processing chrom 6 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-12] processing chrom 13 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-15] processing chrom 16 ...
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-6] processing chrom 7 ...
[I::csp_pileup_core][Thread-8] has pileup-ed in total 0 SNPs for chrom 9
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-16] processing chrom 17 ...
[I::csp_pileup_core][Thread-0] has pileup-ed in total 0 SNPs for chrom 1
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-17] processing chrom 18 ...
[I::csp_pileup_core][Thread-1] has pileup-ed in total 0 SNPs for chrom 2
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-18] processing chrom 19 ...
[I::csp_pileup_core][Thread-5] has pileup-ed in total 0 SNPs for chrom 6
[W::csp_pileup_core] Combined max depth is above 1M. Potential memory hog!
[I::csp_pileup_core][Thread-19] processing chrom 20 ...
[I::csp_pileup_core][Thread-9] has pileup-ed in total 0 SNPs for chrom 10

Any help is appreciated.

hxj5 commented 2 years ago

Hi, is your data 10x scDNA-seq or scATAC-seq? If so, you may add --UMItag None (#26).

haigdjambazian commented 2 years ago

It is 10x scDNA-seq! I'll add that option, Thanks!

haigdjambazian commented 2 years ago

I am still getting all "0 SNPs" for the jobs even with minCOUNT down to 1.

The command I used is: cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -p $PROC --minCOUNT 1 --countORPHAN --UMItag None

If it helps the genome is Plasmodiophora brassicae (size: 24Mb/haploid) and I expect a few hundred cells with coverage between 0.1X and 1X. cellranger-dna (cnv) did not accept the reference but longranger (align) did and was used to make the input bam with processed barcodes (the 737K-crdna-v1.txt barcodes were placed in the longranger installation).

hxj5 commented 2 years ago

Thanks for the details. Does the longranger bam file contain the tag for cell barcodes (the default value for --cellTAG is CB)? If yes, are the cell barcodes from the bam file in the same length and format as the barcodes from the input list (-b)?

haigdjambazian commented 2 years ago

Good point! The barcode tag name is BX with that pipeline. I ran with --cellTAG BX --UMItag None but I still got 0 SNP. The barcode list I supplied is: cellranger-dna-1.1.0/cellranger-dna-cs/1.1.0/lib/python/barcodes/737K-crdna-v1.txt with 16 bases. In the bam the barcode is written like this: BX:Z:ACATCAGCATTCATCT-1 is this an issue?

hxj5 commented 2 years ago

yes, suffixes (e.g., -1, -2, ...) should be added to the barcodes in the input list (-b) as cellsnp performs exact match on the barcodes.

haigdjambazian commented 2 years ago

Just to let you know that this worked in the end with that last change. Thanks!