Closed Spork0527 closed 8 months ago
Figured out what happened here. The AD field has to be in the format of Reference_Allele_Counts,Alternative_Allele_Counts, just as that specified in GATK format VCF. For example:
20 10001019 . T G 364.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.699;ClippingRankSum=0.00;DP=34;ExcessHet=3.0103;FS=3.064;MLEAC=1;MLEAF=0.500;MQ=42.48;MQRankSum=-3.219e+00;QD=11.05;ReadPosRankSum=-6.450e-01;SOR=0.537 GT:AD:DP:GQ:PL 0/1:18,15:33:99:393,0,480 20 10001298 . T A 884.77 . AC=2;AF=1.00;AN=2;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=29.49;SOR=1.765 GT:AD:DP:GQ:PL 1/1:0,30:30:89:913,89,0 20 10001436 . A AAGGCT 1222.73 . AC=2;AF=1.00;AN=2;DP=29;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.36;SOR=0.836 GT:AD:DP:GQ:PL 1/1:0,28:28:84:1260,84,0 20 10001474 . C T 843.77 . AC=2;AF=1.00;AN=2;DP=27;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=31.25;SOR=1.302 GT:AD:DP:GQ:PL 1/1:0,27:27:81:872,81,0 In some VCFs that are processed by other programs, such as VarScan, there are multiple fields including DP (read depth), RD (reference depth), and AD (alternative depth), with each separated by colon. The AD in GATK vcf is instead a combination of RD and AD in Varscan vcf.
Hi @Spork0527 ... thanks for updating to provide the solution. Indeed, I wrote this extension using GATK VCFs as test data ;).
Milan
Hi, I have vcf data pool sequencing generated from VarScan that seems not compatible with your -p option in Dsuite Dtrios. An error message of the AD field not found or something. Would you mind send me the format of the pool sequencing data that you build this code of -p option upon or explain to me how it dealt with AD field data? I may reformat my dataset a little bit to make it compatible.