Closed jiehuichen closed 11 months ago
Hi, yes, you can extract the reads by samtools and then view them in IGV.
To extract reads covering a SNP and output to a BAM file (assuming the SNP's position is chr1:100000):
samtools view -h -b "input_BAM" chr1:100000 > "output_BAM"
samtoos index "output_BAM"
If you only want to extract SNP reads in specific cell (assuming cell barcode is XXX-1 and cell tag is CB):
samtools view -h -b -d CB:XXX-1 "input_BAM" chr1:100000 > "output_BAM"
samtoos index "output_BAM"
Then you can load the output BAM file above into IGV to view the reads.
Thanks Xianjie.
Since the bam file of my project is different from that of 10x, when I tried take codes you shared above to extract SNP reads in specific cell, there will be some warnings:
[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file).
So I just simplified the codes as:
samtools view $input_BAM | less -S | grep "Cell_BarcodeXXX" > SNP1.txt
Then, I checked the sequence of WT and "Mut" in SNP1 region,
cat SNP1.txt | grep "GCGCCTGAC" |wc -l
cat SNP1.txt | grep "GCGCTTGAC" |wc -l
Compare the output with the information in "cellSNP.cells.vcf" by cellSNP-liste, e.g. "1/0:1:11:0:39,33,380:0,10,0,1,0", . I found some of them were not consistent, especially for those cells with less reads count in "cellSNP.cells.vcf".
By the way, during the SNP calling, I dit not use the parameter “--minMAF 0.1 --minCOUNT 20”.
According to your understanding, is it possible that some SNPs with low read count are unreliable/false positive?
If yes, should I always add the filtering “--minMAF 0.1 --minCOUNT 20” when using the cellsnp-lite to avoid some false positive SNPs?
Many thanks.
Hi, the inconsistency of read counts between samtools and cellsnp-lite is probably due to the different filtering settings of the two tools, e.g., by default, cellsnp-lite will filter some low-quality reads (please check --exclFLAG
option) while samtools do not. To make the filtering settings the same, you can use -F
option in samtools view.
Yes, generally speaking, adding "--minMAF 0.1 --minCOUNT 20" is recommended to reduce false positives.
Got you, many thanks.
Hi Xianjie,
Thanks for your great efforts on the cellsnp-lite.
Recently, I met an issue about checking the reads cover an SNP.
For "1/0:1:11:0:39,33,380:0,10,0,1,0", which means this cell is a heterozygote, how to check the real reads cover this SNP? Can we view these reads in IGV or just grab all the reads related to this cell's SNP by samtools?
Best,