MetaMinimac2 is an efficient tool to combine genotype data imputed against multiple reference panels.
Prerequisite: cget>=0.1, cmake>=3.2
git clone https://github.com/yukt/MetaMinimac2.git
cd MetaMinimac2
bash install.sh
The first step is to impute pre-phased target haplotypes against reference panels separately using Minimac4 with --meta
option on, which will generate a .empiricalDose.vcf.gz
file (which is required for MetaMinimac2) in addition to the .dose.vcf.gz
file. Please see http://genome.sph.umich.edu/wiki/Minimac4 for detailed documentation for Minimac4.
minimac4 --refHaps refPanelA.m3vcf \
--haps targetStudy.vcf \
--prefix PanelA.imputed \
--meta
minimac4 --refHaps refPanelB.m3vcf \
--haps targetStudy.vcf \
--prefix PanelB.imputed \
--meta
The second step is to integrate the imputed results using MetaMinimac2.
MetaMinimac2 -i PanelA.imputed:PanelB.imputed -o A_B.meta.testrun
-i, --input <prefix1:prefix2 ...> (Required) Colon-separated prefixes of input data to meta-impute
-o, --output <prefix> (Required) Output prefix [MetaMinimac.Output.Prefix]
-f, --format <string> Comma-separated output FORMAT tags [GT,DS,HDS]
-p, --skipPhasingCheck OFF by default. If ON, program will skip phasing consistency check before analysis.
-s, --skipInfo OFF by default. If ON, the INFO fields are removed from the output file
-n, --nobgzip OFF by default. If ON, output files will NOT be bgzipped
-w, --weight OFF by default. If ON, weights will be saved in [MetaMinimac.Output.Prefix].metaWeights(.gz)
-l, --log OFF by default. If ON, log will be written to $prefix.logfile
-h, --help OFF by default. If ON, detailed documentation on options and usage will be displayed
The meta-imputed result will be saved in [MetaMinimac.Output.Prefix].metaDose.vcf.gz
.
When --weight
is ON, the weights for meta-imputation will be saved in [MetaMinimac.Output.Prefix].metaWeights.gz
. The weight file is also in VCF format, which is good for individual filtering by vcftools or bcftools. In the format field, WT1
stands for the weight on reference panel 1 (which is related to input files with prefix1), WT2
stands for the weight on reference panel 2, etc.
Michigan Imputation Server and TOPMed Imputation Server offer the option to generate .empiricalDose.vcf.gz
file for the convenience of downstream meta-imputation using MetaMinimac2.
Imputation steps:
Genotype Imputation (Meta-Imputation Option)
in the Run
tab;Generate Meta-imputation file
before Submit Job
;.dose.vcf.gz
and .empiricalDose.vcf.gz
files.The reference panels used for meta-imputation must be on the same build. For example, for meta-imputation using 1000G and TOPMed, user should impute using 1000G GRCh38 instead of the original GRCh37 build.
If the phasing is different between input imputed files, the resulting meta dosage (which is supposed to be a weighted average of imputed dosages on the same haplotype) will be messed up. Therefore, we highly recommend that users always keep --skipPhasingCheck
OFF to avoid any risk.
The best practice is to do the phasing first and use the same pre-phased vcf file for imputation against different reference panels.
An alternative is to use .empiricalDose.vcf.gz
file from imputation using one reference panel as input vcf file for imputation using other reference panels.
For example, when the phasing step is performed by Michigan Imputation Server where the phased vcf file is not provided as output, user can use the output .empiricalDose.vcf.gz
file for imputation using other reference panels.
Please note that .empiricalDose.vcf.gz
file contains overlapping sites between the genotype array and the reference panel in use, and does not include the sites existing in the genotype array only, so the results generated by imputation using empiricalDose.vcf could be different from that using a pre-phased vcf including all genotyped sites.