vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
272 stars 53 forks source link

DIA-NN v1.8 SegFaults in Linux during `--relaxed-prot-inf` #285

Closed hguturu closed 2 years ago

hguturu commented 2 years ago

This is a follow on to this comment https://github.com/vdemichev/DiaNN/discussions/134#discussioncomment-2009867. Both previous discussion topics relate to spectral library generation, so this may be a different issue since our segfault seems to be consistently after the output Assembling protein groups.

I may be experiencing a similar non-deterministic issue to https://github.com/vdemichev/DiaNN/discussions/134 and https://github.com/vdemichev/DiaNN/discussions/270. Initially, I thought it was a single thread vs. multi-thread issue since I only saw the bug when I ran the command without the --threads flag. But, I was able to get the segfault when I ran the same command with --threads 30.

I am running DIA-NN with the following command: /usr/diann/1.8/diann-1.8 --dir raw --fasta UP000005640_9606_combined.fasta --lib UP000005640_9606_combined.predicted.speclib --mass-acc-ms1 10 --mass-acc 10 --qvalue 0.01 --matrices --missed-cleavages 1 --met-excision --cut 'K*,R*' --smart-profiling --relaxed-prot-inf --threads 30 on Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1025-aws x86_64) Linux ip-10-82-192-45 5.11.0-1025-aws #27~20.04.1-Ubuntu SMP Fri Jan 7 13:09:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

I ran it via GDB, to help track down to the following:

(gdb) run  --dir raw --fasta UP000005640_9606_combined.fasta  --lib UP000005640_9606_combined.predicted.speclib  --mass-acc-ms1 10 --mass-acc 10 --qvalue 0.01 --matrices --missed-cleavages 1 --met-excision --cut 'K*,R*' --smart-profiling --relaxed-prot-inf --threads 30
*** lots of output skipped ***
[6:47] Quantification information saved to raw/15891.mzML.quant.

[6:47] Cross-run analysis
[6:47] Reading quantification information: 1 files
[6:47] Quantifying peptides
[6:47] Assembling protein groups
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "diann-1.8" received signal SIGSEGV, Segmentation fault.
0x0000555555508517 in void std::__introsort_loop<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}> >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::_
_ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>) ()
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Its hard to debug further since the program is not open source.

Update: Added a strace log for another run using the same command (no gdb). diann_threads30_strace.txt

hguturu commented 2 years ago

The output for a single threaded run with backtrace since the output is more manageable:

Reading symbols from /usr/diann/1.8/diann-1.8...
(gdb) run  --dir raw --fasta UP000005640_9606_combined.fasta  --lib UP000005640_9606_combined.predicted.speclib  --mass-acc-ms1 10 --mass-acc 10 --qvalue 0.01 --matrices --missed-cleavages 1 --met-excision --cut 'K*,R*' --smart-profiling --relaxed-prot-i
nf
Starting program: /usr/diann/1.8/diann-1.8 --dir raw --fasta UP000005640_9606_combined.fasta  --lib UP000005640_9606_combined.predicted.speclib  --mass-acc-ms1 10 --mass-acc 10 --qvalue 0.01 --matrices --missed-cleavages 1 --met-excision --cut 'K*,R*' --
smart-profiling --relaxed-prot-inf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
DIA-NN 1.8 (Data-Independent Acquisition by Neural Networks)
Compiled on Jun 28 2021 10:59:57
Current date and time: Thu Jan 20 20:36:37 2022
Logical CPU cores: 32
Output will be filtered at 0.01 FDR
Precursor/protein x samples expression level matrices will be saved along with the main report
Maximum number of missed cleavages set to 1
N-terminal methionine excision enabled
In silico digest will involve cuts at K*,R*
When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones
Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else
Mass accuracy will be fixed to 1e-05 (MS2) and 1e-05 (MS1)

1 files will be processed
[0:00] Loading spectral library UP000005640_9606_combined.predicted.speclib
[0:03] Library annotated with sequence database(s): UP000005640_9606_combined.fasta
[0:04] Spectral library loaded: 95221 protein isoforms, 196937 protein groups and 4296537 precursors in 1339172 elution groups.
[0:04] Loading protein annotations from FASTA UP000005640_9606_combined.fasta
[0:04] Annotating library proteins with information from the FASTA database
[0:05] Gene names missing for some isoforms
[0:05] Library contains 95221 proteins, and 20444 genes
[0:05] Initialising library

[0:07] File #1/1
[0:07] Loading run raw/15891.mzML
[New Thread 0x7fff8bd21700 (LWP 27259)]
[Thread 0x7fff8bd21700 (LWP 27259) exited]
[0:48] 3087882 library precursors are potentially detectable
[0:48] Processing...
[106:23] RT window set to 0.859588
[106:23] Peak width: 3.568
[106:23] Scan window radius set to 7
[106:24] Recommended MS1 mass accuracy setting: 25.3055 ppm
[122:25] Removing low confidence identifications
[122:25] Removing interfering precursors
[122:26] Training neural networks: 6875 targets, 3879 decoys
[New Thread 0x7fff8bd21700 (LWP 36562)]
[Thread 0x7fff8bd21700 (LWP 36562) exited]
[New Thread 0x7fff8bd21700 (LWP 36563)]
[Thread 0x7fff8bd21700 (LWP 36563) exited]
[122:30] Number of IDs at 0.01 FDR: 2671
[122:30] Calculating protein q-values
[122:30] Number of genes identified at 1% FDR: 430 (precursor-level), 348 (protein-level) (inference performed using proteotypic peptides only)
[122:30] Quantification
[122:31] Quantification information saved to raw/15891.mzML.quant.

[122:31] Cross-run analysis
[122:31] Reading quantification information: 1 files
[122:31] Quantifying peptides
[122:31] Assembling protein groups

Thread 1 "diann-1.8" received signal SIGSEGV, Segmentation fault.
0x0000555555508517 in void std::__introsort_loop<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}> >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>) ()
(gdb) bt
#0  0x0000555555508517 in void std::__introsort_loop<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}> >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>) ()
#1  0x00005555555085d7 in void std::__introsort_loop<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}> >(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>, long, __gnu_cxx::__ops::_Iter_comp_iter<Library::infer_proteins()::{lambda(int, int)#1}>) ()
#2  0x0000555555539cf2 in Library::infer_proteins() ()
#3  0x000055555553a527 in Library::quantify_proteins(int, double, int) ()
#4  0x00005555554341a9 in main ()
hguturu commented 2 years ago

To further document the peculiarity of this bug, it is reproducible with the .quant file generated and is probably the fast way to debug since you can jump to the Assembling protein groups step.

If command is: /usr/diann/1.8/diann-1.8 --use-quant --threads $t --dir raw --fasta UP000005640_9606_combined.fasta --lib UP000005640_9606_combined.predicted.speclib --mass-acc-ms1 10 --mass-acc 10 --qvalue 0.01 --matrices --missed-cleavages 1 --met-excision --cut 'K*,R*' --smart-profiling --relaxed-prot-inf

It doesn't segfault when --threads = 3, 5, 6, or 7, but does segfault for all other tested values from 1-30. Don't know if this is stable (i.e. I don't know if its always 3, 5, 6, 7 or it changes based on other factors such as library or input)

vdemichev commented 2 years ago

Many thanks for reporting this, the problem identified and fixed now.

Best, Vadim

hguturu commented 2 years ago

Excellent, how can we get the patched binary?

Also any ideas on why it was having thread dependent behavior when the code was single threaded?