Closed huww1998 closed 1 year ago
Can you clarify what you believe is wrong with the results you are getting? 3,752,110 variants seems reasonable for 1000g chromosome 1. Note that, in addition to imputing missing genotypes, Minimac also imputes variants that exist in the reference panel but not in the target VCF.
The ID column in the imputed results comes from the reference panel. The reference panel you are using isn't annotated with rsID's but instead uses {chrom}:{pos} as the identifier.
Sorry. Before, I always thought the imputed VCF will only include variants in the target vcf. If Minimac also imputes variants that exist in the reference panel, the imputed VCF maybe right. I also understand why the rsID don't come from target vcf. Thank you very much for your reply.
But I can't subset the same number of variants from the imputed VCF by using the chr pos
information TXT from my target vcf (I uesd bcftools v1.7). If I want to get the imputed VCF consists of the variants that only exist in the target VCF, am I on the right track?
You can achieve this by running bcftools view -i "INFO/TYPED=1" imputed.vcf.gz -Oz -o imputed.typed_variants.vcf.gz
.
There can be multiple variant records for a given position, which is why filtering by chrom:position doesn't work.
Ok. Thank you very much for your guidance. It's very helpful for me.
my target vcf: imputed vcf:
Before using minimac v4.1.2, my target vcf file has been phased by SHAPEIT2. I also used
--all-typed-sites
, but I think it doesn't work. After I checked the target vcf(only included chr 1), it has 19,020 variants. But the imputed vcf has unexpectedly 3,752,110 variants. My Reference Panel is 1000 Genomes Phase 3 downloaded from 1000 Genomes Phase 3 (version 5). I used1.1000g.Phase3.v5.With.Parameter.Estimates.m3vcf.gz
as Reference Panel when I imputed the chr 1. The command are displayed below. Besides, I also tried to subset the imputed vcf file by using thechr::pos
information from my target vcf. But it got 33,019 variants compared with 19,020 variants in the orginal target vcf. The result makes me very confused. I don't know what problem is. Another problem is the rsID missing in the imputed vcf file. Maybe I should set the--sites
,--min-r2
or other parameters to solve these problems?