statgen / Minimac4

GNU General Public License v3.0
54 stars 17 forks source link

Reference Panel download #66

Open Shrishtee-kandoi opened 8 months ago

Shrishtee-kandoi commented 8 months ago

Hi Minimac4 Team,

I am currently facing challenges while attempting to run "minimac4" to impute my genotype files on our High-Performance Cluster. I have encountered a couple of issues that I believe may require your expertise to resolve.

Reference Panel Download: I attempted to download the reference panel from the provided link on the Minimac4 wiki page .. However, despite connecting to the FTP server, the download does not initiate. I would appreciate guidance on the correct procedure or any potential troubleshooting steps.

Target Study VCF File: I want to confirm if the "targetStudy.vcf" file mentioned in the documentation refers to the VCF file generated from Plink files using the following commands from :

plink --bfile YOURFILE --keep-allele-order --freq --out YOURFILE.output --allow-no-sex
plink --bfile YOURFILE --recode vcf --out YOURFILE.output_file --keep-allele-order
vcf-sort YOURFILE.output_file.vcf | bgzip -c > pre_impute_YOURFILE.vcf.gz

Is this the correct process for creating the target VCF file for Minimac4 imputation?

Imputation Code: For imputation, I am using the following code:

minimac4 --refHaps refPanel.m3vcf \
         --haps targetStudy.vcf \
         --prefix testRun \
         --cpus 5

Is "refPanel.m3vcf" downloadable from the link above? and Can I substitute "targetStudy.vcf" with "pre_impute_YOURFILE.vcf.gz" in this command?

Your assistance in resolving these issues would be immensely valuable, and I appreciate your time and support in advance.

Thank you!

jonathonl commented 8 months ago

That wiki is legacy documentation. Use the readme in this repo instead.

You can use the https protocol instead of ftp to download reference panel: https://share.sph.umich.edu/minimac4/panels/.

You must index your target VCF as weill (tabix -p vcf pre_impute_YOURFILE.vcf.gz).

The minimac4 command you reference will work with the correct reference panel, but is deprecated. See readme for commands to use with latest version. You will be using and *.msav reference panel instead of and *.m3vcf.gz.

Shrishtee-kandoi commented 8 months ago

Thanks for getting back! I was able to download the reference panels.

I am now following the command lines exactly as outlined in the readme file:

minimac4 1000g_phase3_v5.chr14.with_parameter_estimates.msav pre_impute_YOURFILE.vcf.gz > imputed_YOURFILE.sav

However, it seems to be disregarding the parameters, and I'm receiving the following warnings:

WARNING - 
Problems encountered parsing command line:

Command line parameter 1000g_phase3_v5.chr14.with_parameter_estimates.msav (#1) ignored
Command line parameter pre_impute_upenn_ucla_mssm_impute_chr14.output_file.vcf.gz (#2) ignored

The same issue persists when using the command: minimac4 1000g_phase3_v5.chr14.with_parameter_estimates.msav pre_impute_upenn_ucla_mssm_impute_chr14.output_file.vcf.gz -o imputed.vcf.gz

Am I required to include additional flags, or is there something else I might be overlooking?

jonathonl commented 8 months ago

I think you are using an old version of minimac4. See the latest at https://github.com/statgen/Minimac4/releases.

Shrishtee-kandoi commented 7 months ago

Thanks! I was waiting for our cluster to update the module. It works now!! I also have a last question: Does minimac4 provide QC results and information on excluded SNPs similar to that of the Imputation server?

jonathonl commented 7 months ago

No, the Imputation Server uses it's own routines for the QC preprocessing step (which includes variant and chunk exclusion). The only metrics that Minimac4 will provide are in the INFO fields of the imputed results (R2, ER2, AVG_CS). You can get a sites-only version of the results with the --sites option, which produces a VCF with these INFO fields but no genotype data. This file is also generated automatically when using the --prefix option.

Shrishtee-kandoi commented 7 months ago

Awesome! Thank you.

Shrishtee-kandoi commented 5 months ago

Hi Jonathon,

I've updated my files to the hg38 build recently. The link you provided before (https://share.sph.umich.edu/minimac4/panels/) has reference files for 1000g_phase3_v5, which is for hg19. Can you guide me to the reference panel for hg38, specifically the one for 1000 Genomes Phase 3 (Hg38)?

Thank you!

jonathonl commented 5 months ago

We do not yet host a b38 panel. You would have to generate one on your own using a phased 1000g call set with minimac4 --compress-reference.