jts / sga

de novo sequence assembler using string graphs
http://genome.cshlp.org/content/22/3/549
237 stars 82 forks source link

Segfault in sga filter #75

Closed kdm9 closed 10 years ago

kdm9 commented 10 years ago

Firstly, thanks for very nice piece of software :smile:

I've come across the following segfault using a self-compiled sga (at master i.e. 44940dd)

(gdb) run filter -p assem/Sample_A3 -o assem/Sample_A3.filterpass.fa -t 12 --homopolymer-check assem/Sample_A3.ec.fa
Starting program: /home/kevin/tmp/build/sga/src/SGA/sga filter -p assem/Sample_A3 -o assem/Sample_A3.filterpass.fa -t 12 --homopolymer-check assem/Sample_A3.ec.fa
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

RLBWT info:
Large Sample rate: 8192
Small Sample rate: 128
Contains 1538735342 symbols in 506523083 runs (3.0378 symbols per run)
Marker Memory -- Small Markers: 144256452 (137.6 MB) Large Markers: 9016080 (8.6 MB)
Total Memory -- Markers: 153272532 (146.2 MB) Str: 506523083 (483.1 MB) Misc: 152 Total: 659795767 (629.230277 MB)
N: 1538735342 Bytes per symbol: 0.428791

[New Thread 0x7fffa7f92700 (LWP 12882)]
[New Thread 0x7fffa7791700 (LWP 12883)]
[New Thread 0x7fffa6f90700 (LWP 12884)]
[New Thread 0x7fffa678f700 (LWP 12885)]
[New Thread 0x7fffa5f8e700 (LWP 12886)]
[New Thread 0x7fffa578d700 (LWP 12887)]
[New Thread 0x7fffa4f8c700 (LWP 12888)]
[New Thread 0x7fffa478b700 (LWP 12889)]
[New Thread 0x7fffa3f8a700 (LWP 12890)]
[New Thread 0x7fffa3789700 (LWP 12891)]
[New Thread 0x7fffa2f88700 (LWP 12892)]
[New Thread 0x7fffa2787700 (LWP 12893)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa578d700 (LWP 12887)]
0x00000000005df85b in BitChar::test (this=0x10007fffa7f9300f, idx=7 '\a') at BitChar.cpp:28
28          return m_data & bc_mask[idx];
(gdb) bt
#0  0x00000000005df85b in BitChar::test (this=0x10007fffa7f9300f, idx=7 '\a') at BitChar.cpp:28
#1  0x00000000005e9166 in BitVector::test (this=0x96e570, i=9223372036854775807) at BitVector.cpp:90
#2  0x000000000057653a in QCProcess::performDuplicateCheck (this=0x96e840, workItem=...) at QCProcess.cpp:238
#3  0x0000000000575df5 in QCProcess::process (this=0x96e840, workItem=...) at QCProcess.cpp:55
#4  0x000000000043f111 in ThreadWorker<SequenceWorkItem, QCResult, QCProcess>::run (this=0x9dd6a0) at ../Concurrency/ThreadWorker.h:194
#5  0x000000000043e97a in ThreadWorker<SequenceWorkItem, QCResult, QCProcess>::startThread (obj=0x9dd6a0) at ../Concurrency/ThreadWorker.h:210
#6  0x00007ffff6d000ca in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007ffff6a3506d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I am filtering pre-processed, corrected reads from low coverage nextera libraries.

I can do a bit more poking, but my guess would be that both the tests on lines 233 & 234 of QCProcess.cpp are failing, meaning that effectively std::min(2^63-1, 2^63-1) happens on L235, and we try slicing 2^63-1 into pSharedBV on line 238. (see : https://github.com/jts/sga/blob/master/src/Algorithm/QCProcess.cpp#L233)

No idea what the implications of this are to the algorithm or my CLI usage, as I'm fairly new w/ sga.

jts commented 10 years ago

Hi Kevin,

Thanks :)

I think there is a mismatch between your input file and the index. You are reading from assem/Sample_A3.ec.fa but loading the index file assem/Sample_A3.bwt (through the -p option). When it tries to check a read that does not exist in the index, it crashes. Can you try again with the option -p assem/Sample_A3.ec and let me know whether it works?

I have added an assertion in 506acff to avoid the hard crash.

Jared

kdm9 commented 10 years ago

Indeed, this looks like it was the problem! Thanks for the speedy fix, and apologies for my ignorance :)

jts commented 10 years ago

No problem. Let me know if you run into any more problems.