rwdavies / QUILT

QUILT: Low coverage whole genome sequence imputation with large reference panels
https://www.nature.com/articles/s41588-021-00877-0
GNU General Public License v3.0
53 stars 10 forks source link

Does adding --posfile and --phasefile affect imputation speed, accuracy, etc.? #44

Open bbdragon1 opened 3 days ago

bbdragon1 commented 3 days ago

Hi My reference panel has already been phased, so I’m wondering if providing posfile and phasefile during the imputation step has a significant impact on accuracy and speed. I noticed that the tutorial doesn’t use these files, but they are mentioned in QUILT_usage.md, which has caused some confusion.

Additionally, I observed accuracy information in the generated log file, like the following: Final imputation accuracy for sample NA12878ONT r2:0.909, PSE:0.2%, disc:6.3% (all SNPs) . How is this accuracy calculated, given that I don’t seem to have provided a gold standard file? Apologies if these questions are too basic; I’m new to this and appreciate your help."

Zilong-Li commented 3 days ago

Hey @bbdragon1,

It will reduce the speed a bit but not significantly by enabling --phasefile and --posfile, which calculates the imputation accuracy by taking the phased genotypes in phasefile as the 'truth' set. For the statastics metrics in the log, r2 is the squared pearson correlation, PSE is the phasing switch error (# switches / # total SNPs), and disc is the genotype disconcordant rate.

Best, Zilong.