soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

Segmentation Fault with hhblits customized database example #260

Open konstin opened 3 years ago

konstin commented 3 years ago

I tried to follow the example on building a customized database and got a segmentation fault.

Expected Behavior

hhblits does not segfault

Current Behavior

hhblits segfaults

Steps to Reproduce (for bugs)

# Installation
mkdir hhsuite
curl -L https://github.com/soedinglab/hh-suite/releases/download/v3.3.0/hhsuite-3.3.0-AVX2-Linux.tar.gz | tar xz -C hhsuite
wget http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/UniRef30_2020_06_hhsuite.tar.gz
tar xf UniRef30_2020_06_hhsuite.tar.gz
# Download our fasta file
wget ftp://orengoftp.biochem.ucl.ac.uk/cath/releases/all-releases/v4_2_0/non-redundant-data-sets/cath-dataset-nonredundant-S20-v4_2_0.fa
# Build a custom database
hhsuite/bin/ffindex_from_fasta -s cath_20_fas.ff{data,index} cath-dataset-nonredundant-S20-v4_2_0.fa
hhsuite/bin/hhblits_omp -i cath_20_fas -oa3m query.a3m -n 1 -d UniRef30_2020_06

HH-suite Output (for bugs)

- 19:28:50.953 INFO: Searching 25985124 column state sequences.                                                                                                                               

- 19:28:51.050 INFO: Thread 0   cath|4_2_0|12asA00/4-330

- 19:28:51.054 INFO: Thread 1   cath|4_2_0|132lA00/2-129

- 19:28:51.084 INFO: cath|4_2_0|12asA00/4-330 is in A2M, A3M or FASTA format

- 19:28:51.086 INFO: Iteration 1

- 19:28:51.095 INFO: cath|4_2_0|132lA00/2-129 is in A2M, A3M or FASTA format

- 19:28:51.096 INFO: Iteration 1

- 19:28:51.345 INFO: Prefiltering database

- 19:28:51.690 INFO: Prefiltering database

- 19:29:33.919 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 286410

- 19:29:35.511 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 723

- 19:29:35.511 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 723

- 19:29:35.511 INFO: Scoring 723 HMMs using HMM-HMM Viterbi alignment

- 19:29:35.667 INFO: Alternative alignment: 0

- 19:29:36.004 INFO: 723 alignments done

- 19:29:36.005 INFO: Alternative alignment: 1

- 19:29:36.330 INFO: 721 alignments done

- 19:29:36.331 INFO: Alternative alignment: 2

- 19:29:36.402 INFO: 72 alignments done

- 19:29:36.402 INFO: Alternative alignment: 3

- 19:29:36.445 INFO: 23 alignments done

Segmentation fault (core dumped)

Context

I wanted to search this fasta against itself using hhsuite, so I started by building a custom database.

Your Environment

Include as many relevant details about the environment you experienced the issue in.

konstin commented 3 years ago

Backtrace from gdb:

(gdb) bt full
#0  0x000000000044ca44 in HMM::Read(_IO_FILE*, int, int, float*, char*) ()
No symbol table info available.
#1  0x000000000046aedb in HHEntry::getTemplateHMM(_IO_FILE*, char*, Parameters&, char, float, int&, float*, float const (*) [20], float const (*) [20], HMM*) ()
No symbol table info available.
#2  0x000000000046b917 in HHDatabaseEntry::getTemplateHMM(Parameters&, char, float, int&, float*, float const (*) [20], float const (*) [20], HMM*) ()
No symbol table info available.
#3  0x000000000045b095 in PosteriorDecoderRunner::executeComputation(HMM&, std::vector<Hit*, std::allocator<Hit*> >, Parameters&, float, float*, float const (*) [20], float const (*) [20], float const (*) [20]) [clone ._omp_fn.0] ()
No symbol table info available.
#4  0x000000000055177f in GOMP_parallel ()
No symbol table info available.
#5  0x000000000045be1e in PosteriorDecoderRunner::executeComputation(HMM&, std::vector<Hit*, std::allocator<Hit*> >, Parameters&, float, float*, float const (*) [20], float const (*) [20], float const (*) [20]) ()
No symbol table info available.
#6  0x0000000000411e16 in HHblits::premerge(Hash<Hit>*, Hash<Hit>*, int&, int&, int) ()
No symbol table info available.
#7  0x0000000000414135 in HHblits::run(_IO_FILE*, char*) ()
No symbol table info available.
#8  0x00000000004054e5 in main._omp_fn ()
No symbol table info available.
#9  0x0000000000551c8e in gomp_thread_start ()
No symbol table info available.
#10 0x0000000000562635 in start_thread (arg=0x15275fe1d700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x15275fe1d700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {23258856544000, -5009088099790023489, 0, 140737488283135, 23258856544704, 848964336, -8056206178702519105, -5009088803271666497}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#11 0x00000000005e5939 in clone ()
No symbol table info available.
konstin commented 3 years ago

SSE2 is also affected, but I found that v3.1.0 and v3.2.0 work. This seems to be a regression introduced in v3.3.0

milot-mirdita commented 3 years ago

You can disable pre-merging with -premerge 0, that will get the old behavior from before 3.3.0, at a loss of some (sometimes quite a lot) sensitivity.

Could you try to figure out which query crashes and upload only that one? Maybe Martin can find a bit time to look at why exactly it crashes, if we have a minimal test case. Probably either of these two:

- 19:28:51.050 INFO: Thread 0   cath|4_2_0|12asA00/4-330
- 19:28:51.054 INFO: Thread 1   cath|4_2_0|132lA00/2-129
konstin commented 3 years ago

I seems that both are required, using either one alone passes:

>cath|4_2_0|12asA00/4-330
MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL
>cath|4_2_0|132lA00/2-129
XVFGRCELAAAMXRHGLDNYRGYSLGNWVCAAXFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAXKIVSDGNGMNAWVAWRNRCXGTDVQAWIRGCRL

-premerge 0 also works. Would you recommend continuing with 3.3.0 and -premerge 0 or with 3.2.0?

milot-mirdita commented 3 years ago

I'd continue using 3.3.0 without premerge. There were other quite important fixes in 3.3.0.

milot-mirdita commented 3 years ago

Okay I actually know what's going on, this is already an issue we encountered before (https://github.com/soedinglab/hh-suite/issues/221), but that's a bit difficult to fix as it requires quite a bit of refactoring.

You can avoid the crash by increasing the default OpenMP stack size by defining e.g. export OMP_STACKSIZE=32768. Then you don't need to mess with -premerge, it is just incidental that it crashes there.