Closed chlangley closed 3 months ago
Hi, This means that HTSlib can not retrieve genotypes for all sample at a given variant. You need the GT field to be defined for every single variant x sample.
Best,
Right. Of course, with a bit of filtering the input data is palatable and phasing follows.
BTW: what happened to the --duohmm option from ShapeIt2?
Thanks for ShapeIt4 - open and effective.
Cheers,
Chuck
Hi,
I am trying to phase a single sample using the 1000 genomes data as reference. I am getting this error despite the fact that I made sure that all variants have GT defined. Any advice?
To make sure that my VCF indeed includes GT for all variants, I ran the following command:
$ bcftools query -f '[%SAMPLE=%GT\n]' sample.vcf.gz | cut -f 2 -d \= | sort | uniq -c
274 0
17122 0/0
451375 0|1
3409075 0/1
23 0|2
317 0/2
1 0/3
14968 1
79343 1|0
1993944 1/1
11773 1|2
89868 1/2
8 2
5 2|0
2818 2|1
2 2/2
Thank you! Alon
I just wanted to update that I solved this issue now. I realize that the entries that have GT values of 0
, 1
, or 2
are causing the problem. I got rid of these by running:
bcftools view -i 'GT~"/" || GT~"|"' input.bcf -O b -o input_fitered_haploid_GT.bcf
Also adding, in case someone else ever bumps into similar issues: for my case there was no missing data of the form of ./1
, ./.
, or 0|.
, etc. but if there are such values, then in addition to the step above, I would have done this:
bcftools view -e `GT="."` input_fitered_haploid_GT.bcf -O b -o input_fitered_missing_GT.bcf
Hello: Just getting started using ShapeIt4 on WGS vcfs for families. Right away there is an error at Initialization:
"shapeit4: src/io/genotype_reader2.cpp:50: void genotype_reader::readGenotypes0(std::__cxx11::string): Assertion `ngt_main == 2 * n_main_samples' failed."
I imagine it may be a vcf format issue.
Thanks for any help. Chuck