samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
633 stars 241 forks source link

GTcheck discrepancy between discordance score and the number of matching genotypes #2210

Closed leetde closed 2 weeks ago

leetde commented 3 weeks ago

Hello,

I am trying to run gtcheck to determine if 2 samples (21616 and 21796check) are derived from the same patient. I first created a filtered merged vcf.gz file for these two samples (derived from RNAseq data) and then ran "bcftools gtcheck 21616_21796check_samples.vcf.gz".

It outputted this:

DCv2 [2]Query Sample [3]Genotyped Sample [4]Discordance [5]Average -log P(HWE) [6]Number of sites compared [6]Number of matching genotypes

DCv2 1281-1-21616-RNA-8 1281-1-21796-check-RNA-7 1.747794e+05 6.838604e-02 20861 20861

As you can see, there is discrepancy between the high discordance score and the fact that all genotypes match across all compared sites. Why might this be?

Here is the first part of the merged VCF file using the less command:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1281-1-21796-check-RNA-7 1281-1-21616-RNA-8

16 17058 rs181746417 A G 350.39 PASS AC=1;AF=0.25;AN=4;BaseQRankSum=1.326;DB;DP=23;ExcessHet=2.1085;FS=0;MLEAC=4;MLEAF=0.25;MQ=60;MQRankSum=0;QD=31.3;ReadPosRankSum=0.429;SOR=1.002 GT:AD:DP:GQ:PL 0/1:1,1:2:39:39,0,39 0/0:2,0:2:6:0,6,90 16 47354 rs8466 A G 230.34 PASS AC=0;AF=0.188;AN=4;BaseQRankSum=-4.137;DB;DP=70;ExcessHet=0;FS=0;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=23.03;ReadPosRankSum=-1.416;SOR=0.452 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,358 0/0:13,0:13:39:0,39,567 16 53923 rs2562145 C G 1115.68 PASS AC=2;AF=0.438;AN=4;BaseQRankSum=0;DB;DP=138;ExcessHet=0.218;FS=4.217;MLEAC=7;MLEAF=0.438;MQ=60;MQRankSum=0;QD=28.61;ReadPosRankSum=1.633;SOR=0.069 GT:AD:DP:GQ:PL 1/1:0,8:8:24:358,24,0 0/0:56,0:56:99:0,169,2476 16 57162 rs10266 C G 264.72 PASS AC=0;AF=0.125;AN=4;BaseQRankSum=0.673;DB;DP=88;ExcessHet=0.2996;FS=1.537;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.07;ReadPosRankSum=0.261;SOR=0.992 GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,125 0/0:28,0:28:84:0,84,1170 16 57275 rs1045001 T G 1183.53 PASS AC=2;AF=0.688;AN=4;BaseQRankSum=0;DB;DP=63;ExcessHet=0;FS=3.825;MLEAC=10;MLEAF=0.625;MQ=60;MQRankSum=0;QD=35.18;ReadPosRankSum=0.612;SOR=1.436 GT:AD:DP:GQ:PL 1/1:0,6:6:18:269,18,0 0/0:22,0:22:66:0,66,985 16 79655 rs710081 C T 2061.48 PASS AC=1;AF=0.125;AN=4;BaseQRankSum=0.845;DB;DP=290;ExcessHet=0.2996;FS=20.285;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.84;ReadPosRankSum=2.321;SOR=0.166 GT:AD:DP:GQ:PL 0/0:33,0:33:99:0,99,1477 0/1:17,30:47:99:1201,0,581 16 84442 rs1061435 C A 5914.05 PASS AC=3;AF=0.813;AN=4;BaseQRankSum=1.462;DB;DP=166;ExcessHet=0.9691;FS=1.977;MLEAC=13;MLEAF=0.813;MQ=60;MQRankSum=0;QD=29.74;ReadPosRankSum=-0.197;SOR=0.987 GT:AD:DP:GQ:PL 1/1:0,11:11:33:494,33,0 0/1:3,2:5:75:75,0,120

pd3 commented 2 weeks ago

There was a bug which resulted in reporting the number of matching genotypes incorrectly, always the same as the number of compared genotypes. This is fixed in the latest github version, please try that.