odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
61 stars 9 forks source link

SHAPEIT5_ligate Segmentation fault #12

Closed LouisLeNezet closed 1 year ago

LouisLeNezet commented 1 year ago

Hi ! I'm developing Shapeit5 on nf-core now and I currently on a Segmentation fault when using SHAPEIT5_ligate. The problem is that this error arise randomly and I don't quite understand why for the moment. You can find all the code here. But the command is simply as follow:

SHAPEIT5_ligate \                                                                                                                                                    
                \                                                                                                                                                               
               --input all_files.txt \                                                                                                                                          
               --thread 2 \                                                                                                                                                     
               --output input.vcf.gz

Where all_files.txt as just two files generated by SHAPEIT5_phase_common.

Do you have an idea which kind of error is behind ?

LouisLeNezet commented 1 year ago

I thought I found the problem. The files had a really long name and by shortening it the error disappeared. But I still encounter this problem from time to time...

LouisLeNezet commented 1 year ago

The error look as follow

           [SHAPEIT5] Ligate (ligate multiple output files into chromosome-wide files)                                                                                          
             * Authors       : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne                                                                                      
             * Contact       : simone.rubinacci@unil.ch & olivier.delaneau@gmail.com                                                                                            
             * Version       : 5.0.1 / commit = 1.0.0 / release = 2023-02-10                                                                                                    
             * Run date      : 17/02/2023 - 19:34:35                                                                                                                            

           Files:                                                                                                                                                               
             * Input LIST     : [all_files.txt]                                                                                                                                 
             * Output VCF     : [1.vcf.gz]                                                                                                                                      
             * Index output   : [NO]                                                                                                                                            

           Parameters:                                                                                                                                                          
             * Seed           : [15052011]                                                                                                                                      
             * #Threads       : [2]                                                                                                                                             

           Read filenames in [all_files.txt]                                                                                                                                    
           * #files = 2                                                                                                                                                       

         Ligating chunks                                                                                                                                                      
           * Creating file descriptor                                                                                                                                         
           * #samples = 1                                                                                                                                                     

         Cnk 0 [chr21:0-1] [L=0]                                                                                                                                              
         Buf 0 [chr21:16650034-16749830] [L_isec=1397 / L_tot=2042] [Avg #hets=5] [Switch rate=1] [Avg phaseQ=28.2288]                                                        
         .command.sh: line 8: 52633 Segmentation fault      SHAPEIT5_ligate --input all_files.txt --thread 2 --output 1.vcf.gz
srubinacci commented 1 year ago

Hi,

The log file looks very suspicious. The first chuck has zero(?) variants and all the rest is detected in the buffer. The intersection has only 5 hets, that's odd too. I am very curious to see your files/chunks. Is that possible?

Thank you for reporting this and for your work with SHAPEIT/GLIMPSE!

LouisLeNezet commented 1 year ago

Hi, You can find the files in my github repository on the branch imputation. The data used are in data/test for the one who works and data/test_error for the one who doesn't work. The command I used in each folder is the following: SHAPEIT5_ligate --input all_files.txt --output test.vcf.gz And I used the SHAPEIT5 build with bioconda. Maybe the problem come from there ? I have to admit that I didn't look into detail when writing the recipe. I just copy pasted what I've done on GLIMPSE2 and as it worked nearly first try I didn't checked more. I just checked this morning and this some trouble with the repo.

LouisLeNezet commented 1 year ago

Hi @srubinacci, I think I found the problem... The thing is that the file in the input file need to be ordered, right ? And in my pipeline it wasn't always the case (explaining the randomness of the error). I also found that the path can't have a space, it will also generate the same error if so.

srubinacci commented 1 year ago

Hi, I see. Indeed, the data need to be ordered and indexed. Indexing gives a way to access to the file in random access, but also requires the VCF/BCF file to be ordered.