samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
666 stars 240 forks source link

Old QCALL-0.1.19 format details #759

Closed danimfernandes closed 6 years ago

danimfernandes commented 6 years ago

Hi, I am trying to write some code to convert VCF to QCALL format, because I need to use QCALL as input for a specific software. I have a few old QCALL and VCF files from low-coverage samples I am using as a base for this. However, I found something that I am not understanding, maybe it is a bug, or maybe I am just not understanding why.

So, let's consider the following line on a QCALL file: chrY | 22749198 | G | 2 | 37 | 0 | 74 | 74 | 6 | 74 | 74 | 6 | 74 | 0 | 6 | 74 ___________________________________AA___AC___AG__AT___CC___CT__CG___GG__GT__TT My understanding is that for any genotype NOT containing any G, the likelihood should be 74; for a heterozygous genotype with a G should be 6; and for the homozygous GG it should be 0. However, following the manual's description of the file and order of genotype likelihoods (ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/QCALL/QCALLManual.pdf) you can see on this line that for L_CT I am getting a 6 instead of 74. This line should instead be: chrY | 22749198 | G | 2 | 37 | 0 | 74 | 74 | 6 | 74 | 74 | 74 | 6 | 0 | 6 | 74 ___________________________________AA___AC___AG__AT___CC___CT___CG__GG__GT__TT

The same happens for when the reference is T, getting a 6 on L_CG: chrY | 22745051 | T | 2 | 37 | 0 | 74 | 74 | 74 | 6 | 74 | 74 | 6 | 74 | 6 | 0

This does not happen for A or C: chrY | 21900849 | A | 1 | 37 | 0 | 0 | 3 | 3 | 3 | 27 | 27 | 27 | 27 | 27 | 27 chrY | 21767959 | C | 4 | 37 | 0 | 135 | 12 | 135 | 135 | 0 | 12 | 12 | 135 | 135 | 135

Is it a bug or am I misunderstanding something? Is the manual wrong? That could be a possibility, considering the ACGT order of combinations. You can see that CT comes before CG. Thanks!

pd3 commented 6 years ago

Hi, I stumbled upon this a long time ago, the description of the ordering was different from the reality. I remember the manual was wrong and the common sense ordering was in fact used in the program, I'd expect: AA, AC, AG, AT, CC, CG, CT, GG, GT, TT

danimfernandes commented 6 years ago

Thanks very much pd3, that answers this issue.