tgen / snpSniffer

Tool for checking genotype concordance between multiple assays
MIT License
0 stars 3 forks source link

Homo Ref Not added to INI #13

Closed PedalheadPHX closed 4 years ago

PedalheadPHX commented 4 years ago

The homozygous reference calls in the snpSniffer.vcf and not being added to the database. Likely this is a format difference between samtools and bcftools that the java app is not handling correctly. @sfacista @bryce-turner

PedalheadPHX commented 4 years ago

Sample MMRF_1371_2_BM_CD138pos_T1_KHWGS.bwa.bam.snpSniffer MMRF_2331_2_BM_CD138pos_T5_KHWGS.bwa.bam.snpSniffer X:119453040 TT TT X:47585586 CC CC X:79171491 NN CC 1:109923844 TC CC 1:110947279 NN NN 1:111352865 NN CT 1:113973095 GA AA

PedalheadPHX commented 4 years ago

1 109923844 . T C 0/1:255,0,255 1 110947279 . A . 0/0:0 1 111352865 . C . 0/0:0

PedalheadPHX commented 4 years ago

Just tested with the OLD jar file and still the same issue so there is some difference in the VCF format that is causing the issue during the addition to the database.

PedalheadPHX commented 4 years ago

So there is a noticeable difference in how AA/homoRef genotypes are encoded Old_Genotype.vcf 1 111489901 . A . 265 . DP=84;AF1=0;AC1=0;DP4=48,30,0,0;MQ=60;FQ=-262 PL 0

New_Genotype.vcf 1 110947279 . A . 244.996 . DP=85;SGB=-0.379885;RPB=1;MQB=1;MQSB=0.989423;BQB=1;MQ0F=0;AF1=0;AC1=0;DP4=50,27,0,1;MQ=60;FQ=-241.988;PV4=0.358974,0.017788,1,0.178514 GT:PL 0/0:0

PedalheadPHX commented 4 years ago

@sfacista I'm guessing the code is expecting just a single "0" in the GT column whereas now it adheres to todays specification, not sure if "0" alone was ever acceptable, with 0/0

sfacista commented 4 years ago

@sfacista I'm guessing the code is expecting just a single "0" in the GT column whereas now it adheres to todays specification, not sure if "0" alone was ever acceptable, with 0/0

AddFile.java was changed on line 106 so that the homozygous reference is expected as "0/0". Other major revisions were documented. Pull request created: https://github.com/tgen/snpSniffer/pull/14/files

Thank you for doing all the work of describing the issue. I tested all the methods, but nothing on live data. Please let me know if anything fails.

PedalheadPHX commented 4 years ago

@sfacista Yes fixing the addition of 0/0 alleles to the database will resolve this issue. I did notice that 7-10 of the b38 human positions are never called so we might want to update the target positions but the main problem is resolved