yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
111 stars 18 forks source link

potential issue when aligning more than 1024 sequences (v1.2.4) #24

Closed ekg closed 2 years ago

ekg commented 3 years ago

Talking with @glennhickey today, we discovered that we are both struggling with a segfault that appears to occur when calling abpoa_free_seq. I specifically tracked it to here: https://github.com/yangao07/abPOA/blob/master/src/abpoa_seq.c#L110. It appears that this only occurs when aligning >1024 (CHUNK_READ_N) sequences. I am interested in the fact that the initial default allocation is for 1024 sequences https://github.com/yangao07/abPOA/blob/bc41c3d8a896cc7390b0e30f67bac731c9569bed/src/abpoa_seq.c#L96.

However, neither of us have been able to reproduce this using the same input and calling the command line abpoa tool. This will make it very hard for you to track down the cause.

Adjusting things to attempt to narrow down the problem, I saw errors like this that suggested that the address of one of the values to be freed had been overwritten by a number. This number is outside of the normal 48-bits used for memory addresses:

[smoothxg::smooth_and_lace] applying global abPOA to 53842 blocks:  2.15% @ 4.18e+00/s elapsed: 00:00:04:36 remain: 00:03:29:59AddressSanitizer:DEADLYSIGNAL
=================================================================
==4901==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7f14924e8532 bp 0xd80000200003f81 sp 0x7f14744f39a0 T19)
==4901==The signal is caused by a READ memory access.
==4901==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
    #0 0x7f14924e8532 in __asan::asan_free(void*, __sanitizer::BufferedStackTrace*, __asan::AllocType) (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libasan.so.6+0x2b532)
    #1 0x7f1492569b43 in free (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libasan.so.6+0xacb43)
    #2 0xa7a3c8 in abpoa_free_seq (/export2/erikg/smoothxg/bin/smoothxg+0xa7a3c8)
    #3 0x7b6948 in abpoa_free /export2/erikg/smoothxg/deps/abPOA/src/abpoa_graph.c:142
    #4 0x7dac90 in smoothxg::smooth_abpoa(xg::XG const&, smoothxg::block_t const&, unsigned long, int, int, int, int, int, int, bool, std::unique_ptr<ska::flat_hash_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> > > > >, std::default_delete<ska::flat_hash_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<maf_partial_row_t, std::allocator<maf_partial_row_t> > > > > > >&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) /export2/erikg/smoothxg/src/smooth.cpp:383
    #5 0x7e9ebd in smoothxg::smooth_and_lace(xg::XG const&, smoothxg::blockset_t*&, int, int, int, int, int, int, bool, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, bool, double, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool, unsigned long) [clone ._omp_fn.0] /export2/erikg/smoothxg/src/smooth.cpp:1528
    #6 0x7f1491dd2c65 in gomp_thread_start (/gnu/store/qj38f3vi4q1d7z30hkpaxyajv49rwamb-gcc-10.2.0-lib/lib/libgomp.so.1+0x19c65)
    #7 0x7f1491d85f63 in start_thread (/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libpthread.so.0+0x7f63)
    #8 0x7f1491cb79ae in clone (/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libc.so.6+0xf69ae)

AddressSanitizer can not provide additional info.

We're both trying to find an input to abpoa that reproduces this error.

glennhickey commented 3 years ago

I think that 1024 thing is a good clue. From what I can tell, abpoa calls abpoa_realloc_seq() from the command line when it's loading the sequences from file with abpoa_read_seq(). But since we're passing in the sequences directly to memory, I don't think that realloc is ever getting called, so we're stuck with the default value of 1024? (still weird asan wouldn't get that if this were the case).

yangao07 commented 3 years ago

I run it on a test 2000 x 1000bp dataset and did not see this segfault.

As @ekg mentioned the error related to the variable comment in abpoa_seq_t, I suspect that you guys did not reallocate it when loading more than 1024 sequences? But I am not very sure about this.

ekg commented 3 years ago

How do we reallocate when there are more than 1024 sequences? We are either calling abpoa_msa or abpoa_poa to drive the alignment.

ekg commented 3 years ago

Also, the error indicates that something else has written to the comment variable. If a memory address had been written, I don't think it would be possible to get the "high value address" that lies outside of the virtual memory of the system (in 48 bits).

yangao07 commented 3 years ago

@ekg You are right, they should be reallocated inside the abpoa_msa function. I just found that realloc bug and fixed it. Please try the latest commit.

Yan