google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.25k stars 728 forks source link

dealing with chimeric alignments #863

Closed alisamatisse closed 3 months ago

alisamatisse commented 3 months ago

hello,

sorry if this question is dumb or the answer is obvious. is there a plausible explanation on how deepvariant treats chimeric reads? I tried deepvariant on filtered .bam files (after samtools view -h -F 2048) and before that step w/o filtering, and the number of found variants is exactly the same before/after. just to check if I am not doing something weird, I also used bcftools, and the results for the numbers of called variants were affected. is deepvariant not considering chimeric reads?

thank you for creating and supporting deepvariant, alisa

alisamatisse commented 3 months ago

ah, forgot to mention, I filtered chimeric reads because I had around 60% of them in every tested sample, was curious how much it affects my results (using samplix enrichment + pacbio hifi)

akolesnikov commented 3 months ago

Hi @alisamatisse,

I'm not sure what samtools view -h -F 2048 does. DeepVariant does not do any special processing for chimeric reads. What reads are used from the input BAM is controlled by the following flags:

Variants are created for all positions where there are at least two reads support an alt allele.

alisamatisse commented 3 months ago

Hi @akolesnikov, very helpful answer, I appreciate it. 🫡