odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
56 stars 9 forks source link

Rare phasing error: `Assertion V.vec_full[vt]->type == VARTYPE_SCAF failed` #52

Open mvvugt opened 11 months ago

mvvugt commented 11 months ago

Hi,

I'm trying to phase rare variants, which goes well for the greatest part. I used the files and instructions you provided in the documentation and GitHub. As mentioned, the majority works very well but some rare phasing chunks fail, as I show in this example below. Do you have any idea why this might be?

[SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)
  * Authors       : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne
  * Contact       : simone.rubinacci@unil.ch & olivier.delaneau@gmail.com
  * Version       : 5.1.1 / commit = 990ed0d / release = 2023-05-08
  * Run date      : 30/06/2023 - 17:19:27

Files:
  * Unphased data : [chr22.exome_array.full.sorted.bcf]
  * Scaffold data : [chr22.exome_array.shapeit5.common_0.001.bcf]
  * Genetic Map   : [shapeit_maps/chr22.b38.gmap.gz]
  * Output data   : [chr22.exome_array.22_28090430_33113736.shapeit5.rares_0.001.bcf]
  * Output LOG    : [chr22.exome_array.22_28090430_33113736.shapeit5.rares_0.001.log]

Parameters:
  * Seed    : 15052011
  * Threads : 32 threads
  * PBWT    : [depth = 2,2 / modulo = 0.1 / mac = 2 / mdr = 0.1]
  * HMM     : [Ne = 15000 / Recombination rates given by genetic map]

Parsing specified genomic regions
  * Scaffold region  [chr22:27590423-33613738]
  * Input region  [chr22:28090430-33113736]

Reading genotype data
[E::bgzf_read_block] Invalid BGZF header at offset 706792049
[E::bgzf_read] Read block operation failed with error 6 after 384803 of 755137 bytes
  * BCF scanning (23.38s)
      + Variant sites: [#scaffold=2546 | #rare=5581 | #all=8127]
      + 4715 rare variants in buffer regions excluded
      + 322 variants found in both files [priority to scaffold]
  * HAP allocation [#scaffold=2546 / #samples=377567] (0.00s)
  * Genotype set allocation [#rare=5581 / #samples=377567] (0.00s)
[E::bgzf_read_block] Invalid BGZF header at offset 706792049
[E::bgzf_read] Read block operation failed with error 6 after 168546 of 755137 bytes
phase_rare_static: src/io/genotype_reader/genotype_reader_reading.cpp:50: void genotype_reader::readGenotypes(): Assertion `V.vec_full[vt]->type == VARTYPE_SCAF' failed.

The commands for common and rare phasing I use are as follows:

# phasing common variants
phase_common_static --input chr22.exome_array.full.sorted.bcf --map shapeit_maps/chr22.b38.gmap.gz --output chr22.exome_array.shapeit5.common_0.001.bcf --thread 64 --log chr22.exome_array.shapeit5.common_0.001.log --filter-maf 0.001 --region chr22

# phasing rare variants
phase_rare_static --input chr22.exome_array.full.sorted.bcf --scaffold chr22.exome_array.shapeit5.common_0.001.bcf --map shapeit_maps/chr22.b38.gmap.gz --output chr22.exome_array.22_28090430_33113736.shapeit5.rares_0.001.bcf --log chr22.exome_array.22_28090430_33113736.shapeit5.rares_0.001.log --scaffold-region chr22:27590423-33613738 --input-region chr22:28090430-33113736 --thread 32

Thanks in advance!

srubinacci commented 11 months ago

Hi,

As the problems seems to arise while reading genotype data, ([E::bgzf_read_block] Invalid BGZF header at offset 706792049) you might want to check that both files: -input chr22.exome_array.full.sorted.bcf --scaffold chr22.exome_array.shapeit5.common_0.001.bcf are not corrupted.

My first guess is that actually phase_common failed or somehow generated a bcf file that is not well formatted somehow. Let me know if that is the case

Simone

mvvugt commented 10 months ago

Dear Simone,

Thanks for the response!

I tested the input files by performing bcftools stats on them, and this worked as expected. Also, only a minority of the chunks fail with this error, so there are also chunks with the same input that do successfully complete the rare phasing.

I would therefore suggest the files are not corrupt, would you agree? Do you have any other idea? Thanks! Marion

mvvugt commented 9 months ago

Dear @srubinacci,

Thanks again for your response and great work. I was just wondering if you had any suggestions or updates on this issue?

Thanks in advance! Kind regards, Marion