rwdavies / QUILT

GNU General Public License v3.0
45 stars 10 forks source link

Impute specific variants using QUILT #25

Open biozzq opened 1 year ago

biozzq commented 1 year ago

Hi,

Since my focus is on specific regions and variants on the genome, I'm wondering if I can remove non-targeted positions from the reference panel before imputation to reduce the running time. However, I'm concerned about whether this would significantly affect the accuracy of the imputation.

Additionally, I'm trying to determine the best approach for using QUILT with a window size of 2Mb. Would running QUILT by sample produce higher imputation accuracy, or should I run QUILT with all samples together? I'd appreciate any insights on which strategy is more effective.

Best wishes, Zheng zhuqing

rwdavies commented 1 year ago

Hi,

The more sites you put into the imputation, the better the imputation will be, although there won't be a linear relationship between number of sites and imputation quality. Informally, more sites facilitates more accurate imputation as there is more information available to resolve which of the reference haplotypes an individual carries. The simplest thing you can do is to remove variants in high LD with each other, using e.g. PLINK, doing e.g. clump and remove all sites with r2 of 1 with another site that is retained. This will slightly affect accuracy (as the sites will be less robust to sequencing error) but would be the free-est way to make imputation faster without meaningfully sacrificing accuracy.

QUILT imputes all samples independently. So you can structure your workflow however you'd like given the constraints of your computational environment.

Best, Robbie