ultimatesource / denovogear

A program to detect denovo-variants using next-generation sequencing data.
http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2611.html
GNU General Public License v3.0
49 stars 25 forks source link

GATK vcf as input for dng-dnm #297

Closed bthiruv closed 5 years ago

bthiruv commented 5 years ago

We are comparing the output of dng-dnm using samtools mpileup and GATK joint-genotyped vcf as input. We found some variants experimentally validated as de novo have small pp_dnm scores computed using GATK vcf as input while using mpileup bcf, the same variants have a high pp_dnm.

Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974

dng-dnm output using GATK vcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=38;MQ_MOM=.;MQ_DAD=.;SNPcode=2;code=9 NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/AA/AG:0.923387:1.19684e-07:AG/AA/AA:0.0766131:9.9301e-09:26:.

dng-dnm output using mpileup bcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=35;MQ_MOM=60;MQ_DAD=60;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AA/AA/AA:1.74581e-06:7.88776e-15:AG/AA/AA:0.999998:9.9301e-09:27:60

Can you please help us understand the output?

reedacartwright commented 5 years ago

The difference is due to gatk and mpileup generating different vcfs at the site. What do those files look like at this site?

On Wed, Feb 20, 2019, 21:20 bthiruv notifications@github.com wrote:

We are comparing the output of dng-dnm using samtools mpileup and GATK joint-genotyped vcf as input. We found some variants experimentally validated as de novo have small pp_dnm scores computed using GATK vcf as input while using mpileup bcf, the same variants have a high pp_dnm.

Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974

dng-dnm output using GATK vcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=38;MQ_MOM=.;MQ_DAD=.;SNPcode=2;code=9 NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/AA/AG:0.923387:1.19684e-07:AG/AA/AA:0.0766131:9.9301e-09:26:.

dng-dnm output using mpileup bcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=35;MQ_MOM=60;MQ_DAD=60;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AA/AA/AA:1.74581e-06:7.88776e-15:AG/AA/AA:0.999998:9.9301e-09:27:60

Can you please help us understand the output?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHr37aQneTlm8KNz1FxUVKZ-YfyMkks5vPh6RgaJpZM4bGn6E .

bthiruv commented 5 years ago

The GATK vcf - Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974

I don't have the bcf, but from the dng-dnm output, the DP is close enough.

reedacartwright commented 5 years ago

dng-dnm utilizes PL values. If you are getting different results then GATK and mpileup are likely giving you different PL values. That is where I would look at first.

On Wed, Feb 20, 2019, 21:31 bthiruv notifications@github.com wrote:

The GATK vcf - Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974

I don't have the bcf, but from the dng-dnm output, the DP is close enough.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297#issuecomment-465858169, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHiF5o_p6ZAPAcF-Dsw1n6QzzFadoks5vPiEagaJpZM4bGn6E .

bthiruv commented 5 years ago

The PL for the site from GATK is - the parents - genotype 0/0 is 0,101,1093 and 0,33,495 the child - genotype 0/1 is 388,0,974.

AD for parents is 37,0 and 38,0 and for the child is 16,10.

Would this be a low confidence site?

bthiruv commented 5 years ago

The SNPcode and code is different for GATK vcf and mpileup bcf as input. Not sure how to interpret the code.

reedacartwright commented 5 years ago

Can you try the develop branch to see if it does the same thing?

On Wed, Feb 20, 2019, 21:45 bthiruv notifications@github.com wrote:

The SNPcode and code is different for GATK vcf and mpileup bcf as input. Not sure how to interpret the code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297#issuecomment-465860231, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHi_eF1vKdfxaIqoue2hpQIKy40TTks5vPiRSgaJpZM4bGn6E .

bthiruv commented 5 years ago

Sure, will try that. And also generate the mpileup vcf.