jltsiren / gcsa2

BWT-based index for graphs
MIT License
71 stars 11 forks source link

Cannot serialize when empty #37

Closed adamnovak closed 5 years ago

adamnovak commented 5 years ago

Constructing a gcsa::GCSA with the default constructor, and then immediately calling serialize() to serialize the (empty) index to an ostream, results in a segfault.

If it is forbidden to serialize a default-constructed gcsa::GCSA, that needs to be documented in a doc comment for serialize(). Otherwise, the serialization needs to be fixed to work.

jltsiren commented 5 years ago

The empty index was not properly initialized. 8bc53138f7a1ce111775659d85a63d91d142c62e should fix the issue.

adamnovak commented 4 years ago

I think this empty-GCSA-serialization problem is back, maybe in a slightly different form.

Try this:

echo '{}' | vg view -Jv - >graph.vg
vg index -g graph.gcsa -k 16 graph.vg

It crashes during serialization of the (empty) GCSA:

Crash report for vg v1.22.0-191-gfa79b61dc "Rotella"
Stack trace (most recent call last):
#14   Object "", at 0xffffffffffffffff, in 
#13   Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847377fc9, in _start
#12   Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f21a92c3b96, in __libc_start_main
      Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7f21a92c3b96]
#11   Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8472a8557, in main
      Source "src/main.cpp", line 78, in main [0x55b8472a8557]
         75:     auto* subcommand = vg::subcommand::Subcommand::get(argc, argv);
         76:     if (subcommand != nullptr) {
         77:         // We found a matching subcommand, so run it
      >  78:         return (*subcommand)(argc, argv);
         79:     } else {
         80:         // No subcommand found
         81:         string command = argv[1];
#10   Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b84788ca57, in vg::subcommand::Subcommand::operator()(int, char**) const
    | Source "src/subcommand/subcommand.cpp", line 72, in operator()
    |    71: const int Subcommand::operator()(int argc, char** argv) const {
    | >  72:     return main_function(argc, argv);
    |    73: }
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in operator() [0x55b84788ca57]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#9    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477fa1c4, in main_index(int, char**)
      Source "src/subcommand/index_main.cpp", line 692, in main_index [0x55b8477fa1c4]
        689:         if (show_progress) {
        690:             cerr << "Saving the index to disk..." << endl;
        691:         }
      > 692:         vg::io::VPKG::save(gcsa_index, gcsa_name);
        693:         vg::io::VPKG::save(lcp_array, gcsa_name + ".lcp");
        694: 
        695:         // Verify the index
#8    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477ff4a9, in void vg::io::VPKG::save<gcsa::GCSA>(gcsa::GCSA const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
      Source "/z/home/anovak/workspace/vg/include/vg/io/vpkg.hpp", line 425, in save<gcsa::GCSA> [0x55b8477ff4a9]
        422:             }
        423:             
        424:             // Save to it
      > 425:             save<Have>(have, open_file);
        426:         }
        427:     }
#7    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477ff31c, in void vg::io::VPKG::save<gcsa::GCSA>(gcsa::GCSA const&, std::ostream&)
    | Source "/z/home/anovak/workspace/vg/include/vg/io/vpkg.hpp", line 400, in operator()
    |   399:         // Start the save
    | > 400:         tag_and_saver->second((const void*)&have, [&](const string& message) {
    |   401:             // For each message that we have to output during the save, output it via the emitter with the selected tag.
    |   402:             // TODO: We copy the data string.
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in save<gcsa::GCSA> [0x55b8477ff31c]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#6    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847df0b0c, in std::_Function_handler<void (void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&), vg::io::wrap_bare_saver[abi:cxx11](std::function<void (void const*, std::ostream&)>)::{lambda(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)#1}>::_M_invoke(std::_Any_data const&, void const*&&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_ArgTypes>(__args)...);
    |   318:       }
      Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 319, in _M_invoke [0x55b847df0b0c]
        316:         assert(to_save != nullptr);
        317:         
        318:         // Get ahold of an ostream that calls our emit_message function
      > 319:         with_function_calling_stream(emit_message, [&ostream_saver, &to_save](ostream& out) {
        320:             // And save to it
        321:             ostream_saver(to_save, out);
        322:         });
#5    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847df0982, in vg::io::with_function_calling_stream(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<void (std::ostream&)> const&)
    | Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 291, in operator()
    |   290:         // Run the saver on that stream
    | > 291:         use_stream(write_pipe);
    |   292:     }
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in with_function_calling_stream [0x55b847df0982]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#4    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847def814, in std::_Function_handler<void (std::ostream&), vg::io::wrap_bare_saver[abi:cxx11](std::function<void (void const*, std::ostream&)>)::{lambda(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)#1}::operator()(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&) const::{lambda(std::ostream&)#1}>::_M_invoke(std::_Any_data const&, std::ostream&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_ArgTypes>(__args)...);
    |   318:       }
    | Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 321, in operator()
    |   319:         with_function_calling_stream(emit_message, [&ostream_saver, &to_save](ostream& out) {
    |   320:             // And save to it
    | > 321:             ostream_saver(to_save, out);
    |   322:         });
      Source "/usr/include/c++/7/bits/std_function.h", line 706, in _M_invoke [0x55b847def814]
        703:     {
        704:       if (_M_empty())
        705:    __throw_bad_function_call();
      > 706:       return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        707:     }
        708: 
        709: #if __cpp_rtti
#3    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847dc5bf6, in std::_Function_handler<void (void const*, std::ostream&), vg::io::register_loader_saver_gcsa()::{lambda(void const*, std::ostream&)#2}>::_M_invoke(std::_Any_data const&, void const*&&, std::ostream&)
    | Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
    |   314:       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
    |   315:       {
    | > 316:    (*_Base::_M_get_pointer(__functor))(
    |   317:        std::forward<_ArgTypes>(__args)...);
    |   318:       }
      Source "src/io/register_loader_saver_gcsa.cpp", line 31, in _M_invoke [0x55b847dc5bf6]
         28:     }, [](const void* index_void, ostream& output) {
         29:         // Cast to GCSA and serialize to the stream.
         30:         assert(index_void != nullptr);
      >  31:         ((const gcsa::GCSA*) index_void)->serialize(output);
         32:     });
         33: }
#2    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847f26766, in gcsa::GCSA::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const
      Source "/z/home/anovak/workspace/vg/deps/gcsa2/gcsa.cpp", line 150, in serialize [0x55b847f26766]
        148:   for(size_type comp = 0; comp < this->alpha.sigma; comp++)
        149:   {
      > 150:     written_bytes += this->fast_bwt[comp].serialize(out, child, "fast_bwt");
        151:   }
        152:   for(size_type comp = 0; comp < this->alpha.sigma; comp++)
        153:   {
#1    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847f3b147, in sdsl::bit_vector_il<512u>::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const
      Source "/z/home/anovak/workspace/vg/include/sdsl/bit_vector_il.hpp", line 209, in serialize [0x55b847f3b147]
        206:             written_bytes += write_member(m_block_num, out, child, "block_num");
        207:             written_bytes += write_member(m_superblocks, out, child, "superblocks");
        208:             written_bytes += write_member(m_block_shift, out, child, "block_shift");
      > 209:             written_bytes += m_data.serialize(out, child, "data");
        210:             written_bytes += m_rank_samples.serialize(out, child, "rank_samples");
        211:             structure_tree::add_size(child, written_bytes);
        212:             return written_bytes;
#0    Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847393106, in sdsl::int_vector<(unsigned char)64>::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) const
      Source "/z/home/anovak/workspace/vg/include/sdsl/int_vector.hpp", line 1570, in serialize [0x55b847393106]
       1567: {
       1568:     structure_tree_node* child = structure_tree::add_child(v, name, util::class_name(*this));
       1569:     size_type written_bytes = 0;
      >1570:     if (t_width > 0 and write_fixed_as_variable) {
       1571:         written_bytes += int_vector<0>::write_header(m_size, t_width, out);
       1572:     } else {
       1573:         written_bytes += int_vector<t_width>::write_header(m_size, m_width, out);
jltsiren commented 4 years ago

Something is wrong on vg side. When I extract kmers from the empty graph with vg kmers -g -B and build GCSA using build_gcsa, everything works correctly. If I try building GCSA from the kmers with vg index using the -i option, I get the same crash.

If the line numbers in the stack trace are correct, the crash occurs when evaluating the conditions in an if statement. t_width is a template parameter with value (unsigned char)64, so the first part is true, while write_fixed_as_variable is false.