srubinacci / imputation-ukb-ref-panel

Genotype imputation pipelines for the UK Biobank Research Analysis Platform
https://srubinacci.gitbook.io/uk-biobank-imputation-pipelines/
MIT License
8 stars 0 forks source link

How to interpret quality score outputs from low-pass WGS imputation on UKB #2

Open mpoe827 opened 4 months ago

mpoe827 commented 4 months ago

Thank you so much for this incredible imputation tool! After following the steps for the pipeline for low-pass WGS on UK Biobank data, I am wondering how to interpret the quality score returned in the bcf file. The info scores all appear to be 0.99 or 1 ; is this an expected result and how are these info scores calculated?

srubinacci commented 1 month ago

Hi, INFO scores are meaningful if you have a relatively large amount of imputed individuals (indicatively in the orders of at least hundreds). If imputation has been run in batches, you can combine VCF files and recompute the info score on the whole cohort (e.g. using bcftools +impute-info). I cannot go into the details of the INFO score here, but there're a vast literature available. Regarding in the calculations you can find a derivation in Marchini and Howie 2010, Supplementary S3.

Simone