google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 721 forks source link

I have a question about how deepvariant create the vcf files? #846

Closed pioneer-pi closed 2 months ago

pioneer-pi commented 3 months ago

Hello, I wonder how do you create the vcf file?

  1. For example, how do deepvariant produce the value of QUAL of each variant?
  2. I see you use p_error while calculating the GQ? What's the p_error and how you get it ?
  3. I see Filter filed has value : PASS, RefCall and so on. How did it decided?
  4. And I have no idea about what's meaning of PL?

A vcf item: chr20 61098 . C T 48.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:49:34:17,17:0.5:48,0,66

Thank you!!!

kishwarshafin commented 3 months ago

@pioneer-pi please see how deepvariant works to see that the CNN provides a probability vector which we use to determine genotype likelihoods.

  1. QUAL is the minimum probability of the genotype. Please see here for details.
  2. GQ and qual are synonymous one is just rounded. Please see how this method determines both QUAL and GQ.
  3. RefCall is called if the QUAL is below a set threshold which is 3. See here for details.
  4. PL is the genotype likelihoods. See code for details. Also described in header:
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype likelihoods rounded to the closest integer">