torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

segmentation faults with the zobrist branch #121

Closed frederic-mahe closed 5 years ago

frederic-mahe commented 5 years ago

Many unit tests fail when using a binary compiled with the latest code of that branch:

cd ./swarm-tests/
bash run_all_tests.sh /tmp/swarm/src/swarm

the problem could be limited to short sequences, as I did not observe any issue when feeding swarm with massive amounts of sequences with lengths ranging from 32 to 1,000 nucleotides.

Here is a simple test:

./swarm <(printf ">s1_10\nACGT\n") > /dev/null

that yields a segmentation fault. Here's GDB's log:

gdb swarm
(gdb) run <(printf ">s1_10\nACGT\n")
...
Hashing sequences: 0%
Program received signal SIGSEGV, Segmentation fault.
0x000055555555a900 in bloom_set () at bloompat.h:54
54    * bloom_adr(b, h) &= ~ bloom_pat(b, h);

(gdb) backtrace
#0  0x000055555555a900 in bloom_set () at bloompat.h:54
#1  hash_insert (amp=0) at algod1.cc:153
#2  0x00005555555592cb in algo_d1_run () at algod1.cc:661
#3  0x0000555555555bda in main (argc=<optimized out>, argv=<optimized out>) at swarm.cc:676

(gdb) frame 1
#1  hash_insert (amp=0) at algod1.cc:153
153   bloom_set(bloom_a, hash);

(gdb) frame 2
#2  0x00005555555592cb in algo_d1_run () at algod1.cc:661
661       hash_insert(i);
torognes commented 5 years ago

The segfault appeared when there were just 1 or 2 sequences in the input. It was due to insufficient size of the bitmap in one of the bloom filters. Fixed in commit a5c7a21896ca6d685e0893ab30988ba327bbe292.

frederic-mahe commented 5 years ago

The issue is fixed.

Some unit tests still fail, I'll investigate and open new issues if need be.