odelaneau / shapeit4

Segmented HAPlotype Estimation and Imputation Tool
MIT License
89 stars 17 forks source link

Assertion `ngt_main == 2 * n_main_samples' failed. #14

Closed chlangley closed 3 months ago

chlangley commented 4 years ago

Hello: Just getting started using ShapeIt4 on WGS vcfs for families. Right away there is an error at Initialization:

"shapeit4: src/io/genotype_reader2.cpp:50: void genotype_reader::readGenotypes0(std::__cxx11::string): Assertion `ngt_main == 2 * n_main_samples' failed."

I imagine it may be a vcf format issue.
Thanks for any help. Chuck

odelaneau commented 4 years ago

Hi, This means that HTSlib can not retrieve genotypes for all sample at a given variant. You need the GT field to be defined for every single variant x sample.

Best,

chlangley commented 4 years ago

Right. Of course, with a bit of filtering the input data is palatable and phasing follows. BTW: what happened to the --duohmm option from ShapeIt2? Thanks for ShapeIt4 - open and effective.
Cheers, Chuck

ShaiberAlon commented 3 years ago

Hi,

I am trying to phase a single sample using the 1000 genomes data as reference. I am getting this error despite the fact that I made sure that all variants have GT defined. Any advice?

To make sure that my VCF indeed includes GT for all variants, I ran the following command:

$ bcftools query -f '[%SAMPLE=%GT\n]' sample.vcf.gz | cut -f 2 -d \=  | sort | uniq -c
    274 0
  17122 0/0
 451375 0|1
3409075 0/1
     23 0|2
    317 0/2
      1 0/3
  14968 1
  79343 1|0
1993944 1/1
  11773 1|2
  89868 1/2
      8 2
      5 2|0
   2818 2|1
      2 2/2

Thank you! Alon

ShaiberAlon commented 3 years ago

I just wanted to update that I solved this issue now. I realize that the entries that have GT values of 0, 1, or 2 are causing the problem. I got rid of these by running:

bcftools view -i 'GT~"/" || GT~"|"' input.bcf -O b -o input_fitered_haploid_GT.bcf

Also adding, in case someone else ever bumps into similar issues: for my case there was no missing data of the form of ./1, ./., or 0|., etc. but if there are such values, then in addition to the step above, I would have done this:

bcftools view -e `GT="."` input_fitered_haploid_GT.bcf -O b -o input_fitered_missing_GT.bcf