Closed bthiruv closed 5 years ago
The difference is due to gatk and mpileup generating different vcfs at the site. What do those files look like at this site?
On Wed, Feb 20, 2019, 21:20 bthiruv notifications@github.com wrote:
We are comparing the output of dng-dnm using samtools mpileup and GATK joint-genotyped vcf as input. We found some variants experimentally validated as de novo have small pp_dnm scores computed using GATK vcf as input while using mpileup bcf, the same variants have a high pp_dnm.
Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974
dng-dnm output using GATK vcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=38;MQ_MOM=.;MQ_DAD=.;SNPcode=2;code=9 NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/AA/AG:0.923387:1.19684e-07:AG/AA/AA:0.0766131:9.9301e-09:26:.
dng-dnm output using mpileup bcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=35;MQ_MOM=60;MQ_DAD=60;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AA/AA/AA:1.74581e-06:7.88776e-15:AG/AA/AA:0.999998:9.9301e-09:27:60
Can you please help us understand the output?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHr37aQneTlm8KNz1FxUVKZ-YfyMkks5vPh6RgaJpZM4bGn6E .
The GATK vcf - Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974
I don't have the bcf, but from the dng-dnm output, the DP is close enough.
dng-dnm utilizes PL values. If you are getting different results then GATK and mpileup are likely giving you different PL values. That is where I would look at first.
On Wed, Feb 20, 2019, 21:31 bthiruv notifications@github.com wrote:
The GATK vcf - Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974
I don't have the bcf, but from the dng-dnm output, the DP is close enough.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297#issuecomment-465858169, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHiF5o_p6ZAPAcF-Dsw1n6QzzFadoks5vPiEagaJpZM4bGn6E .
The PL for the site from GATK is - the parents - genotype 0/0 is 0,101,1093 and 0,33,495 the child - genotype 0/1 is 388,0,974.
AD for parents is 37,0 and 38,0 and for the child is 16,10.
Would this be a low confidence site?
The SNPcode and code is different for GATK vcf and mpileup bcf as input. Not sure how to interpret the code.
Can you try the develop branch to see if it does the same thing?
On Wed, Feb 20, 2019, 21:45 bthiruv notifications@github.com wrote:
The SNPcode and code is different for GATK vcf and mpileup bcf as input. Not sure how to interpret the code.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denovogear/denovogear/issues/297#issuecomment-465860231, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOHi_eF1vKdfxaIqoue2hpQIKy40TTks5vPiRSgaJpZM4bGn6E .
Sure, will try that. And also generate the mpileup vcf.
We are comparing the output of dng-dnm using samtools mpileup and GATK joint-genotyped vcf as input. We found some variants experimentally validated as de novo have small pp_dnm scores computed using GATK vcf as input while using mpileup bcf, the same variants have a high pp_dnm.
Input vcf (sample order is mother, father, child) - chr1 111152316 . A G 804.97 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-3.420e-01;ClippingRankSum=0.00;DP=134;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.88;ReadPosRankSum=0.662;SOR=0.650;VQSLOD=21.25;culprit=MQ GT:AD:DP:GQ:PGT:PID:PL 0/0:37,0:37:99:.:.:0,101,1093 0/0:38,0:38:33:.:.:0,33,495 0/1:16,10:26:99:0|1:111152316_A_G:388,0,974
dng-dnm output using GATK vcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=38;MQ_MOM=.;MQ_DAD=.;SNPcode=2;code=9 NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AG/AA/AG:0.923387:1.19684e-07:AG/AA/AA:0.0766131:9.9301e-09:26:.
dng-dnm output using mpileup bcf as input - chr1 111152316 . A G 0 PASS RD_MOM=37;RD_DAD=35;MQ_MOM=60;MQ_DAD=60;SNPcode=1;code=6; NULL_CONFIG(child/mom/dad):PP_NULL:ML_NULL:DNM_CONFIG(child/mom/dad):PP_DNM:ML_DNM:RD:MQ AA/AA/AA:1.74581e-06:7.88776e-15:AG/AA/AA:0.999998:9.9301e-09:27:60
Can you please help us understand the output?