This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
I am trying to run gtcheck to determine if 2 samples (21616 and 21796check) are derived from the same patient. I first created a filtered merged vcf.gz file for these two samples (derived from RNAseq data) and then ran "bcftools gtcheck 21616_21796check_samples.vcf.gz".
It outputted this:
DCv2 [2]Query Sample [3]Genotyped Sample [4]Discordance [5]Average -log P(HWE) [6]Number of sites compared [6]Number of matching genotypes
As you can see, there is discrepancy between the high discordance score and the fact that all genotypes match across all compared sites. Why might this be?
Here is the first part of the merged VCF file using the less command:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1281-1-21796-check-RNA-7 1281-1-21616-RNA-8
16 17058 rs181746417 A G 350.39 PASS AC=1;AF=0.25;AN=4;BaseQRankSum=1.326;DB;DP=23;ExcessHet=2.1085;FS=0;MLEAC=4;MLEAF=0.25;MQ=60;MQRankSum=0;QD=31.3;ReadPosRankSum=0.429;SOR=1.002 GT:AD:DP:GQ:PL 0/1:1,1:2:39:39,0,39 0/0:2,0:2:6:0,6,90
16 47354 rs8466 A G 230.34 PASS AC=0;AF=0.188;AN=4;BaseQRankSum=-4.137;DB;DP=70;ExcessHet=0;FS=0;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=23.03;ReadPosRankSum=-1.416;SOR=0.452 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,358 0/0:13,0:13:39:0,39,567
16 53923 rs2562145 C G 1115.68 PASS AC=2;AF=0.438;AN=4;BaseQRankSum=0;DB;DP=138;ExcessHet=0.218;FS=4.217;MLEAC=7;MLEAF=0.438;MQ=60;MQRankSum=0;QD=28.61;ReadPosRankSum=1.633;SOR=0.069 GT:AD:DP:GQ:PL 1/1:0,8:8:24:358,24,0 0/0:56,0:56:99:0,169,2476
16 57162 rs10266 C G 264.72 PASS AC=0;AF=0.125;AN=4;BaseQRankSum=0.673;DB;DP=88;ExcessHet=0.2996;FS=1.537;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.07;ReadPosRankSum=0.261;SOR=0.992 GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,125 0/0:28,0:28:84:0,84,1170
16 57275 rs1045001 T G 1183.53 PASS AC=2;AF=0.688;AN=4;BaseQRankSum=0;DB;DP=63;ExcessHet=0;FS=3.825;MLEAC=10;MLEAF=0.625;MQ=60;MQRankSum=0;QD=35.18;ReadPosRankSum=0.612;SOR=1.436 GT:AD:DP:GQ:PL 1/1:0,6:6:18:269,18,0 0/0:22,0:22:66:0,66,985
16 79655 rs710081 C T 2061.48 PASS AC=1;AF=0.125;AN=4;BaseQRankSum=0.845;DB;DP=290;ExcessHet=0.2996;FS=20.285;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.84;ReadPosRankSum=2.321;SOR=0.166 GT:AD:DP:GQ:PL 0/0:33,0:33:99:0,99,1477 0/1:17,30:47:99:1201,0,581
16 84442 rs1061435 C A 5914.05 PASS AC=3;AF=0.813;AN=4;BaseQRankSum=1.462;DB;DP=166;ExcessHet=0.9691;FS=1.977;MLEAC=13;MLEAF=0.813;MQ=60;MQRankSum=0;QD=29.74;ReadPosRankSum=-0.197;SOR=0.987 GT:AD:DP:GQ:PL 1/1:0,11:11:33:494,33,0 0/1:3,2:5:75:75,0,120
There was a bug which resulted in reporting the number of matching genotypes incorrectly, always the same as the number of compared genotypes. This is fixed in the latest github version, please try that.
Hello,
I am trying to run gtcheck to determine if 2 samples (21616 and 21796check) are derived from the same patient. I first created a filtered merged vcf.gz file for these two samples (derived from RNAseq data) and then ran "bcftools gtcheck 21616_21796check_samples.vcf.gz".
It outputted this:
DCv2 [2]Query Sample [3]Genotyped Sample [4]Discordance [5]Average -log P(HWE) [6]Number of sites compared [6]Number of matching genotypes
DCv2 1281-1-21616-RNA-8 1281-1-21796-check-RNA-7 1.747794e+05 6.838604e-02 20861 20861
As you can see, there is discrepancy between the high discordance score and the fact that all genotypes match across all compared sites. Why might this be?
Here is the first part of the merged VCF file using the less command:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1281-1-21796-check-RNA-7 1281-1-21616-RNA-8
16 17058 rs181746417 A G 350.39 PASS AC=1;AF=0.25;AN=4;BaseQRankSum=1.326;DB;DP=23;ExcessHet=2.1085;FS=0;MLEAC=4;MLEAF=0.25;MQ=60;MQRankSum=0;QD=31.3;ReadPosRankSum=0.429;SOR=1.002 GT:AD:DP:GQ:PL 0/1:1,1:2:39:39,0,39 0/0:2,0:2:6:0,6,90 16 47354 rs8466 A G 230.34 PASS AC=0;AF=0.188;AN=4;BaseQRankSum=-4.137;DB;DP=70;ExcessHet=0;FS=0;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=23.03;ReadPosRankSum=-1.416;SOR=0.452 GT:AD:DP:GQ:PL 0/0:8,0:8:24:0,24,358 0/0:13,0:13:39:0,39,567 16 53923 rs2562145 C G 1115.68 PASS AC=2;AF=0.438;AN=4;BaseQRankSum=0;DB;DP=138;ExcessHet=0.218;FS=4.217;MLEAC=7;MLEAF=0.438;MQ=60;MQRankSum=0;QD=28.61;ReadPosRankSum=1.633;SOR=0.069 GT:AD:DP:GQ:PL 1/1:0,8:8:24:358,24,0 0/0:56,0:56:99:0,169,2476 16 57162 rs10266 C G 264.72 PASS AC=0;AF=0.125;AN=4;BaseQRankSum=0.673;DB;DP=88;ExcessHet=0.2996;FS=1.537;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.07;ReadPosRankSum=0.261;SOR=0.992 GT:AD:DP:GQ:PL 0/0:3,0:3:9:0,9,125 0/0:28,0:28:84:0,84,1170 16 57275 rs1045001 T G 1183.53 PASS AC=2;AF=0.688;AN=4;BaseQRankSum=0;DB;DP=63;ExcessHet=0;FS=3.825;MLEAC=10;MLEAF=0.625;MQ=60;MQRankSum=0;QD=35.18;ReadPosRankSum=0.612;SOR=1.436 GT:AD:DP:GQ:PL 1/1:0,6:6:18:269,18,0 0/0:22,0:22:66:0,66,985 16 79655 rs710081 C T 2061.48 PASS AC=1;AF=0.125;AN=4;BaseQRankSum=0.845;DB;DP=290;ExcessHet=0.2996;FS=20.285;MLEAC=2;MLEAF=0.125;MQ=60;MQRankSum=0;QD=24.84;ReadPosRankSum=2.321;SOR=0.166 GT:AD:DP:GQ:PL 0/0:33,0:33:99:0,99,1477 0/1:17,30:47:99:1201,0,581 16 84442 rs1061435 C A 5914.05 PASS AC=3;AF=0.813;AN=4;BaseQRankSum=1.462;DB;DP=166;ExcessHet=0.9691;FS=1.977;MLEAC=13;MLEAF=0.813;MQ=60;MQRankSum=0;QD=29.74;ReadPosRankSum=-0.197;SOR=0.987 GT:AD:DP:GQ:PL 1/1:0,11:11:33:494,33,0 0/1:3,2:5:75:75,0,120