odelaneau / shapeit4

Segmented HAPlotype Estimation and Imputation Tool
MIT License
90 stars 18 forks source link

shapeit fails with error message - Segmentation fault (core dumped) #45

Open kvn95ss opened 3 years ago

kvn95ss commented 3 years ago

I'm trying to phase a vcf with 983 exome samples.

The command I ran -

/mnt/exome/Softwares/shapeit4/bin/shapeit4.2 --input MOD_hg19.vcf.gz --map /mnt/exome/Softwares/shapeit4/maps/chr"$chr".b37.gmap.gz --region "$chr" --output chr"$chr".phased.vcf.gz

I am looping over all chromosomes, it seems like the error occurs for all chromosomes.

The error -

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : olivier.delaneau@gmail.com
  * Version       : 4.2.0
  * Run date      : 19/01/2021 - 17:25:51

Files:
  * Input VCF     : [MOD_hg19.vcf.gz]
  * Genetic Map   : [/mnt/exome/Softwares/shapeit4/maps/chr1.b37.gmap.gz]
  * Output VCF    : [chr1.phased.vcf.gz]

Parameters:
  * Seed    : 15052011
  * Threads : 10 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active

Initialization:
  * VCF/BCF scanning [N=983 / L=175794 / Reg=1] (44.73s)
  * VCF/BCF parsing [Hom=53.5% / Het=3.2% / Mis=43.3%] (48.68s)
  * GMAP parsing [n=256895] (0.26s)
  * cM interpolation [s=16794 / i=159000] (0.03s)
  * Region length [249218770 bp / 286.3 cM]
  * HMM parameters [Ne=15000 / Error=0.0001 / #rare=45428]
  * PBWT indexing [l=3228] (0.01s)
  * HAP update (1.17s)
  * H2V transpose (0.23s)
  * PBWT phase sweep (12.61s)
  * Build genotype graphs [seg=1869070] (0.55s)

Burn-in iteration [1/5]
  * V2H transpose (0.36s)
  * PBWT selection (1.35s)
  * C2H transpose (0.14s)
  * HMM computations [K=185.689+/-115.639 / W=5.16Mb] (225.76s)
Segmentation fault (core dumped)

What could be the issue?

odelaneau commented 3 years ago

Hi,

My bad. I forgot to disable a mutex when using --thread 1 in the pre-release of v4.2.0. Bug is now fixed.

Alternatively, you can use --thread T with T>1 to make the previous version run.

Best,

Olivier.

kvn95ss commented 3 years ago

Hay,

In the command which I had posted, I had forgot to include that I had indeed used the argument --thread 10, as can be seen from the Threads count in parameters section.

I still get the same error. I get the same error even if I don't specify the threads. I tried different values like 4 and 16 but I still get the same error.

odelaneau commented 3 years ago

Okay, so the issue is more complex than what I thought.

Can u send me a small dataset reproducing the error on my email?

A particularity I saw in your data is the super high missing data rate ~43.3%. Would be good to know why you've got this.

kvn95ss commented 3 years ago

A particularity I saw in your data is the super high missing data rate ~43.3%. Would be good to know why you've got this.

We have ~983 samples from 5-6 different exome capture kits, ranging from NextEra to Agilent CREv2, so it could explain the high missing data rate.

I will get a subset of the data and try to share it with you.

kvn95ss commented 3 years ago

Hay, I've shared the chr 1 file with you, do let me know if anything else is needed.

freeseek commented 3 years ago

I have the same issue and I know at least another researcher experiencing the same segfault. I would be happy to test the code with my pipeline if given access to a development version of the code.

odelaneau commented 3 years ago

Hi,

Thanks for the data set you sent.

I tried multiple times with the last version v4.2, changing some parameters, and I cannot reproduce your segfault. The job completes each time I run it. I really do not understand what is going on ! What gcc version did you use to compile? I'll try with it. In my case, I used v7.5.0

On another note, I looked at your data and hereafter some comments I wanted to make:

Best,

Olivier.

Edit: I'll add static binaries available very soon on github for testing purposes.

kvn95ss commented 3 years ago

Thanks for the reply, I actually installed shapeit4 from Conda. I think I might try to compile it myself and check it once.

As for the comments on the data,

freeseek commented 3 years ago

I have the same problem with shapeit4.2 compiled with gcc version 10.2 using Ubuntu 20.10 and Ubuntu 21.04

You can reproduce the segfault with the HapMap data at this temporary URL:

https://www.dropbox.com/sh/35jja570f1n3bui/AACwSJGeZg-DDxvgFQdbGr9Ua?dl=0

The link contains also a dockerfile to recreate an environment that reproduces the error

$ shapeit4.2 --input input.bcf --reference ref.bcf --map genetic_map.txt --region chr6 --output output.bcf

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : olivier.delaneau@gmail.com
  * Version       : 4.2.0
  * Run date      : 02/03/2021 - 10:47:30

Files:
  * Input VCF     : [input.bcf]
  * Reference VCF : [ref.bcf]
  * Genetic Map   : [genetic_map.txt]
  * Output VCF    : [output.bcf]

Parameters:
  * Seed    : 15052011
  * Threads : 1 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active

Initialization:
  * VCF/BCF scanning [Nm=40 / Nr=3202 / L=8806 / Reg=chr6] (28.80s)
  * VCF/BCF parsing [Hom=69.4% / Het=30.5% / Mis=0.1%] (30.46s)
  * GMAP parsing [n=224546] (0.18s)
  * cM interpolation [s=7575 / i=1231] (0.01s)
  * Region length [44334760 bp / 70.2 cM]
  * HMM parameters [Ne=15000 / Error=0.0001 / #rare=45]
  * PBWT indexing [l=1952] (0.00s)
  * HAP update (0.00s)
  * H2V transpose (0.04s)
  * PBWT phase sweep (0.60s)
  * Build genotype graphs [seg=35791] (0.01s)

Burn-in iteration [1/5]
  * V2H transpose (0.00s)
  * PBWT selection (0.26s)
  * C2H transpose (0.00s)
  * HMM computations [K=112.776+/-97.535 / W=3.34Mb] (1.00s)
munmap_chunk(): invalid pointer
Aborted (core dumped)

Running with valgrind shows the following:

$ valgrind shapeit4.2 --thread 2 --input input.bcf --reference ref.bcf --map genetic_map.txt --region chr6 --output output.bcf

==1639510== Memcheck, a memory error detector
==1639510== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1639510== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1639510== Command: shapeit4.2 --thread 2 --input input.bcf --reference ref.bcf --map genetic_map.txt --region chr6 --output output.bcf
==1639510== 

SHAPEIT
  * Author        : Olivier DELANEAU, University of Lausanne
  * Contact       : olivier.delaneau@gmail.com
  * Version       : 4.2.0
  * Run date      : 02/03/2021 - 10:43:49

Files:
  * Input VCF     : [input.bcf]
  * Reference VCF : [ref.bcf]
  * Genetic Map   : [genetic_map.txt]
  * Output VCF    : [output.bcf]

Parameters:
  * Seed    : 15052011
  * Threads : 2 threads
  * MCMC    : 15 iterations [5b + 1p + 1b + 1p + 1b + 1p + 5m]
  * PBWT    : Depth of PBWT neighbours to condition on: 4
  * PBWT    : Store indexes at variants [MAC>=2 / MDR<=0.5 / Dist=0.02 cM]
  * HMM     : K is variable / min W is 2.50cM / Ne is 15000
  * HMM     : Recombination rates given by genetic map
  * HMM     : AVX2 optimization active

Initialization:
  * VCF/BCF scanning [Nm=40 / Nr=3202 / L=8806 / Reg=chr6] (705.77s)
  * VCF/BCF parsing [Hom=69.4% / Het=30.5% / Mis=0.1%] (875.84s)
  * GMAP parsing [n=224546] (5.77s)
  * cM interpolation [s=7575 / i=1231] (0.16s)
  * Region length [44334760 bp / 70.2 cM]
  * HMM parameters [Ne=15000 / Error=0.0001 / #rare=45]
  * PBWT indexing [l=1952] (0.02s)
  * HAP update (0.04s)
  * H2V transpose (0.57s)
  * PBWT phase sweep (19.03s)
  * Build genotype graphs [seg=35791] (0.25s)

Burn-in iteration [1/5]
  * V2H transpose (0.01s)
  * PBWT selection (11.50s)
  * C2H transpose (0.02s)
==1639510== Thread 3:[0%]
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x127A79: haplotype_segment_single::forward() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x1868E2: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x128CF3: haplotype_segment_single::forward() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x1868E2: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x127DC0: haplotype_segment_single::forward() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x1868E2: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x128027: haplotype_segment_single::forward() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x1868E2: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x12AA53: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x129EC0: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x12AE0E: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x12AFEB: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x12A197: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x129B2F: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x12A273: haplotype_segment_single::backward(std::vector<double, std::allocator<double> >&, std::vector<float, std::allocator<float> >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x186900: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
==1639510== Thread 2:[2%]
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x12882C: haplotype_segment_single::forward() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x1868E2: phaser::phaseWindow(int, int) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18730A: phaseWindow_callback(void*) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x49D858F: start_thread (pthread_create.c:463)
==1639510==    by 0x4E57222: clone (clone.S:95)
==1639510== 
  * HMM computations [K=110.431+/-102.296 / W=3.26Mb] (157.69s)
==1639510== Thread 1:
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x126FBF: std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x126FCD: std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x483DF75: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1639510==    by 0x126FD3: std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Invalid free() / delete / delete[] / realloc()
==1639510==    at 0x483DFBF: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1639510==    by 0x126FD3: std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510==  Address 0x138e4a is in the Text segment of /home/genovese/bin/shapeit4.2
==1639510==    at 0x138E4A: haplotype_set::transposePBWTarrays() (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x126FDF: std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x4AADB7C: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Conditional jump or move depends on uninitialised value(s)
==1639510==    at 0x4AADB82: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Use of uninitialised value of size 8
==1639510==    at 0x4AADBA5: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510== 
==1639510== Invalid read of size 4
==1639510==    at 0x4AADBA5: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1639510== 
==1639510== 
==1639510== Process terminating with default action of signal 11 (SIGSEGV)
==1639510==  Access not within mapped region at address 0x0
==1639510==    at 0x4AADBA5: std::locale::~locale() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1639510==    by 0x11BA15: haplotype_set::mergeIBD2constraints() [clone .cold] (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x18A530: phaser::phase() (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x174E18: phaser::phase(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /home/genovese/bin/shapeit4.2)
==1639510==    by 0x123964: main (in /home/genovese/bin/shapeit4.2)
==1639510==  If you believe this happened as a result of a stack
==1639510==  overflow in your program's main thread (unlikely but
==1639510==  possible), you can try to increase the size of the
==1639510==  main thread stack using the --main-stacksize= flag.
==1639510==  The main thread stack size used in this run was 8388608.
==1639510== 
==1639510== HEAP SUMMARY:
==1639510==     in use at exit: 21,615,416 bytes in 18,154 blocks
==1639510==   total heap usage: 3,074,393 allocs, 3,056,240 frees, 6,894,923,963 bytes allocated
==1639510== 
==1639510== LEAK SUMMARY:
==1639510==    definitely lost: 0 bytes in 0 blocks
==1639510==    indirectly lost: 0 bytes in 0 blocks
==1639510==      possibly lost: 0 bytes in 0 blocks
==1639510==    still reachable: 21,615,416 bytes in 18,154 blocks
==1639510==         suppressed: 0 bytes in 0 blocks
==1639510== Rerun with --leak-check=full to see details of leaked memory
==1639510== 
==1639510== Use --track-origins=yes to see where uninitialised values come from
==1639510== For lists of detected and suppressed errors, rerun with: -s
==1639510== ERROR SUMMARY: 1137193 errors from 21 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
kvn95ss commented 3 years ago

@odelaneau sorry for my previous reply, I didn't install it using Conda, I compiled it in RHEL with gcc verison 8.3.1 20190311. So I don't think that's the cause of my error.

Pretty strange that I'm encountering the error for my dataset, I have tried at least 3-4 times but get same error.

cnettel commented 3 years ago

The GCC optimizer has become far more aggressive in recent releases (well, maybe roughly at GCC 7 or so), hence my pull request (#48). It can seem like a nuisance that a bool method exits without returning, but the optimizer assumes that this means that the codepath will never be used. I cannot claim of course that this is the only source of errors, but I can tell that I got crashes and memory corruption when the return type was bool with recent GCCs and proper operation with return type void.

freeseek commented 3 years ago

@cnettel your fix solved the issue for me. Thank you! The test case I presented now works and I was able to run multiple SHAPEIT4.2.0 tasks again without experiencing any segmentation faults over a large DNA microarray cohort. And the new version did cut phasing costs almost by a half. In case anybody is interested, I have used the following Dockerfile to run SHAPEIT4.2.0:

FROM ubuntu:21.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get -qqy update --fix-missing && \
    apt-get -qqy install --no-install-recommends \
                 wget \
                 g++ \
                 make \
                 libboost-iostreams-dev \
                 libboost-program-options-dev \
                 libhts-dev \
                 libbz2-dev \
                 libboost-iostreams1.74.0 \
                 libboost-program-options1.74.0 \
                 bcftools && \
    wget --no-check-certificate https://github.com/odelaneau/shapeit4/archive/v4.2.0.tar.gz && \
    tar xzf v4.2.0.tar.gz && \
    cd shapeit4-4.2.0 && \
    sed -i 's/^HTSLIB_INC=\$(HOME)\/Tools\/htslib-1.9$/HTSLIB_INC=-Ihtslib/' makefile && \
    sed -i 's/^HTSLIB_LIB=\$(HOME)\/Tools\/htslib-1.9\/libhts.a$/HTSLIB_LIB=-lhts/' makefile && \
    sed -i 's/^BOOST_LIB_IO=\/usr\/lib\/x86_64-linux-gnu\/libboost_iostreams.a$/BOOST_LIB_IO=-lboost_iostreams/' makefile && \
    sed -i 's/^BOOST_LIB_PO=\/usr\/lib\/x86_64-linux-gnu\/libboost_program_options.a$/BOOST_LIB_PO=-lboost_program_options/' makefile && \
    sed -i 's/^\tbool merge (const IBD2track \& rhs) {$/\tvoid merge (const IBD2track \& rhs) {/' src/containers/haplotype_set.h && \
    make && \
    mv bin/shapeit4.2 /usr/bin/ && \
    cd .. && \
    apt-get -qqy purge --auto-remove --option APT::AutoRemove::RecommendsImportant=false \
                 wget \
                 g++ \
                 make \
                 libboost-iostreams-dev \
                 libboost-program-options-dev \
                 libhts-dev \
                 libbz2-dev && \
    apt-get -qqy clean && \
    rm -rf v4.2.0.tar.gz \
           shapeit4-4.2.0 \
           /var/lib/apt/lists/*
odelaneau commented 3 years ago

Guys, thank you very much for your help, you saved me so much time!

So I compiled using g++9/g++10 and indeed got the segfault on kvn95ss's dataset.

Fixing the code given cnettel's suggestion solves the issue. Thanks!

I made a pre-release v4.2.1, I'd be grateful if some of you guys could double check that this fixes the previous issue. If yes, I'll push it as new release.

Of note, I also removed the other htslib related warnings.

Best,

kvn95ss commented 3 years ago

Thank you all for the help.

@odelaneau I'll compile the latest version and see if it sorts the issue. Will update in some time!

EDIT - I was able to compile with no issues, but for some reason it says it can't read the index file, even though I created one using GATK.

Initialization:
  * VCF/BCF scanning ...
ERROR: Problem opening index file for [MODGRCh38.vcf]

pbgzipping and indexing the vcf with tabix -p vcf gives me this error

Initialization:
  * VCF/BCF scanning ...
ERROR: No variants to be phased in [MOD_hg19.vcf.gz]
freeseek commented 3 years ago

I have tested the v4.2.1 pre-release on a small cohort and it worked fine. It does still say 4.2.0 in src/phaser/phaser_parameters.cpp though.

odelaneau commented 3 years ago

@freeseek I updated the tag, thanks.

@kvn95ss Error1: Seems that your file has not been BGZF compressed before indexing. Error2: Argument given to shapeit4 option --region is probably incorrect. Typical mistake is --region 20 instead of --region chr20.

kvn95ss commented 3 years ago

Hay @odelaneau

Thanks for that, I was able to sort out the errors. Now it runs on my data!

abedkurdi commented 2 years ago

Hi everyone, I am getting the same error ERROR: No variants to be phased in [input.vcf.gz]. Any thought about what could be causing this other than Error 1 and mentioned by @odelaneau?

Thank you.

jikhashkya commented 2 years ago

@abedkurdi @odelaneau Any solution with this? I had a chr21 panel that was separated into two mutually exclusive panels, where the unphased panel consists of only 5 samples whereas the other set is used as a reference panel, (both panels have been bgzipped and indexed) but I still get a similar error ERROR: No variants to be phased in files. The reference panel is phased and is of file type (.vcf.gz). Although I tried using a (.bcf.gz) file type as well, but I got the same error. Both the reference panel and unphased panel have all the variant sites in common so not sure what I'm missing, any help would be appreciated.

odelaneau commented 2 years ago

Try to merge using bcftools merge -m none. The underlying procedure is the same than for shapeit. Just to check what is going on.

jikhashkya commented 2 years ago

Hi @odelaneau, thank you for the quick reply. I was hoping to make use of the reference panel to see how the phasing accuracy would differ. I did run the bcftools merge -m none <ref-panel> <unphased-panel> and it worked fine. Did you want me to try and run ShapeIT on the merged panel that would have some unphased individuals and the rest phased individuals?

Npaffen commented 1 year ago

Hello @odelaneau, I compiled shapeit4-4.2.2 with g++ (Ubuntu 11.2.0-19ubuntu1) 11.2.0. I want to shape some .bcf files but I also recieve as the thread opener the following output:

SHAPEIT

Files:

Parameters:

Initialization:

My system info is the following: Linux version 5.15.0-50-generic (buildd@lcy02-amd64-086) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022

Is this still an open problem of shapeit4 or did I do something wrong?

odelaneau commented 1 year ago

Hi,

[Hom=2.7% / Het=2.1% / Mis=95.3%]

It crashes because you have 95.3% missing data!

Best,

O

Npaffen commented 1 year ago

Thanks for your quick reply! I will see what I can do to fix this issue. Just merged some lifted 23andme files with some CEU 1kg files. But I have no idea why so many missings appear.

Nozi-A commented 1 year ago

Thank you all for the help.

@odelaneau I'll compile the latest version and see if it sorts the issue. Will update in some time!

EDIT - I was able to compile with no issues, but for some reason it says it can't read the index file, even though I created one using GATK.

Initialization:
  * VCF/BCF scanning ...
ERROR: Problem opening index file for [MODGRCh38.vcf]

pbgzipping and indexing the vcf with tabix -p vcf gives me this error

Initialization:
  * VCF/BCF scanning ...
ERROR: No variants to be phased in [MOD_hg19.vcf.gz]

Hi everyone, I am also having the same problem in shapit4, I have the ARS-UCD1.2 reference panel, I did manage to convert it to vcf and index it, but I'm still getting the "no variants to be phased" error. any help, please?

Biowood commented 2 months ago

Hi @Nozi-A, I have a similar issue as you. I am using 1000G as a reference panel and the unphased.vcf files were called by GATK HaplotypeCaller. I try to merge my unphased.vcf file with 1000G panel by bcftools but still meet the error:

  • VCF/BCF scanning done (0.04s)
    • Variants [#sites=0 / region=22]

ERROR: No variants to be phased!

Hi @odelaneau, could you offer some suggestions to solve this problem?