szpiech / garlic

Genomic Autozygosity Regions Likelihood Inference and Classification
GNU General Public License v3.0
6 stars 2 forks source link

Confusion with tgls file #8

Open erickfigue opened 1 month ago

erickfigue commented 1 month ago

Hi I'm Erick,

I've been using Garlic with only tped and tfam files, I'm trying to include the tgls file, but something that is not clear for me is the likelihood value format. I'm trying with the file in phred scale (PL), since I'm getting it from the vcf using bcftools. In the example tgls file there is only one value for each individual, I assume is the GL for the called genotype. In my case I'm getting three values for the PL data from my vcf file, for each individual (each possible genotype), and Garlic accept it with no error. My doubt is if the program is identifying the correct value from the three scores in the input or I need to extract only the PL score for the called genotype. Thanks in advance.

szpiech commented 1 month ago

Hi Erick,

Well, I've been trying to remember what I was thinking back when I implemented this one (quite a few years ago). I recall going off of a VCF spec document, but you're of course right the PL entry given in VCF files has 3 values. Well, the best advice I can give at the moment is to actually use GQ instead of PL. If you don't have GQ values called in your data, it should be the second highest number among the three PL. In practice, our lab has only ever used GQ.

-Zachary

On Tue, Jun 11, 2024 at 2:43 PM erickfigue @.***> wrote:

Hi I'm Erick,

I've been using Garlic with only tped and tfam files, I'm trying to include the tgls file, but something that is not clear for me is the likelihood value format. I'm trying with the file in phred scale (PL), since I'm getting it from the vcf using bcftools. In the example tgls file there is only one value for each individual, I assume is the GL for the called genotype. In my case I'm getting three values for the PL data from my vcf file, for each individual (each possible genotype), and Garlic accept it with no error. My doubt is if the program is identifying the correct value from the three scores in the input or I need to extract only the PL score for the called genotype. Thanks in advance.

— Reply to this email directly, view it on GitHub https://github.com/szpiech/garlic/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVZCEKD7EL43ETCVZLZG5AM5AVCNFSM6AAAAABJE4STK6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DOMBWGQYDANI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

erickfigue commented 1 month ago

Understood. Thank you.