Closed adamnovak closed 5 years ago
The empty index was not properly initialized. 8bc53138f7a1ce111775659d85a63d91d142c62e should fix the issue.
I think this empty-GCSA-serialization problem is back, maybe in a slightly different form.
Try this:
echo '{}' | vg view -Jv - >graph.vg
vg index -g graph.gcsa -k 16 graph.vg
It crashes during serialization of the (empty) GCSA:
Crash report for vg v1.22.0-191-gfa79b61dc "Rotella"
Stack trace (most recent call last):
#14 Object "", at 0xffffffffffffffff, in
#13 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847377fc9, in _start
#12 Object "/lib/x86_64-linux-gnu/libc-2.27.so", at 0x7f21a92c3b96, in __libc_start_main
Source "../csu/libc-start.c", line 310, in __libc_start_main [0x7f21a92c3b96]
#11 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8472a8557, in main
Source "src/main.cpp", line 78, in main [0x55b8472a8557]
75: auto* subcommand = vg::subcommand::Subcommand::get(argc, argv);
76: if (subcommand != nullptr) {
77: // We found a matching subcommand, so run it
> 78: return (*subcommand)(argc, argv);
79: } else {
80: // No subcommand found
81: string command = argv[1];
#10 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b84788ca57, in vg::subcommand::Subcommand::operator()(int, char**) const
| Source "src/subcommand/subcommand.cpp", line 72, in operator()
| 71: const int Subcommand::operator()(int argc, char** argv) const {
| > 72: return main_function(argc, argv);
| 73: }
Source "/usr/include/c++/7/bits/std_function.h", line 706, in operator() [0x55b84788ca57]
703: {
704: if (_M_empty())
705: __throw_bad_function_call();
> 706: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
707: }
708:
709: #if __cpp_rtti
#9 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477fa1c4, in main_index(int, char**)
Source "src/subcommand/index_main.cpp", line 692, in main_index [0x55b8477fa1c4]
689: if (show_progress) {
690: cerr << "Saving the index to disk..." << endl;
691: }
> 692: vg::io::VPKG::save(gcsa_index, gcsa_name);
693: vg::io::VPKG::save(lcp_array, gcsa_name + ".lcp");
694:
695: // Verify the index
#8 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477ff4a9, in void vg::io::VPKG::save<gcsa::GCSA>(gcsa::GCSA const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
Source "/z/home/anovak/workspace/vg/include/vg/io/vpkg.hpp", line 425, in save<gcsa::GCSA> [0x55b8477ff4a9]
422: }
423:
424: // Save to it
> 425: save<Have>(have, open_file);
426: }
427: }
#7 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b8477ff31c, in void vg::io::VPKG::save<gcsa::GCSA>(gcsa::GCSA const&, std::ostream&)
| Source "/z/home/anovak/workspace/vg/include/vg/io/vpkg.hpp", line 400, in operator()
| 399: // Start the save
| > 400: tag_and_saver->second((const void*)&have, [&](const string& message) {
| 401: // For each message that we have to output during the save, output it via the emitter with the selected tag.
| 402: // TODO: We copy the data string.
Source "/usr/include/c++/7/bits/std_function.h", line 706, in save<gcsa::GCSA> [0x55b8477ff31c]
703: {
704: if (_M_empty())
705: __throw_bad_function_call();
> 706: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
707: }
708:
709: #if __cpp_rtti
#6 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847df0b0c, in std::_Function_handler<void (void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&), vg::io::wrap_bare_saver[abi:cxx11](std::function<void (void const*, std::ostream&)>)::{lambda(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)#1}>::_M_invoke(std::_Any_data const&, void const*&&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)
| Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
| 314: _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
| 315: {
| > 316: (*_Base::_M_get_pointer(__functor))(
| 317: std::forward<_ArgTypes>(__args)...);
| 318: }
Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 319, in _M_invoke [0x55b847df0b0c]
316: assert(to_save != nullptr);
317:
318: // Get ahold of an ostream that calls our emit_message function
> 319: with_function_calling_stream(emit_message, [&ostream_saver, &to_save](ostream& out) {
320: // And save to it
321: ostream_saver(to_save, out);
322: });
#5 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847df0982, in vg::io::with_function_calling_stream(std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&, std::function<void (std::ostream&)> const&)
| Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 291, in operator()
| 290: // Run the saver on that stream
| > 291: use_stream(write_pipe);
| 292: }
Source "/usr/include/c++/7/bits/std_function.h", line 706, in with_function_calling_stream [0x55b847df0982]
703: {
704: if (_M_empty())
705: __throw_bad_function_call();
> 706: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
707: }
708:
709: #if __cpp_rtti
#4 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847def814, in std::_Function_handler<void (std::ostream&), vg::io::wrap_bare_saver[abi:cxx11](std::function<void (void const*, std::ostream&)>)::{lambda(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&)#1}::operator()(void const*, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&) const::{lambda(std::ostream&)#1}>::_M_invoke(std::_Any_data const&, std::ostream&)
| Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
| 314: _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
| 315: {
| > 316: (*_Base::_M_get_pointer(__functor))(
| 317: std::forward<_ArgTypes>(__args)...);
| 318: }
| Source "/z/home/anovak/workspace/vg/deps/libvgio/src/registry.cpp", line 321, in operator()
| 319: with_function_calling_stream(emit_message, [&ostream_saver, &to_save](ostream& out) {
| 320: // And save to it
| > 321: ostream_saver(to_save, out);
| 322: });
Source "/usr/include/c++/7/bits/std_function.h", line 706, in _M_invoke [0x55b847def814]
703: {
704: if (_M_empty())
705: __throw_bad_function_call();
> 706: return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
707: }
708:
709: #if __cpp_rtti
#3 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847dc5bf6, in std::_Function_handler<void (void const*, std::ostream&), vg::io::register_loader_saver_gcsa()::{lambda(void const*, std::ostream&)#2}>::_M_invoke(std::_Any_data const&, void const*&&, std::ostream&)
| Source "/usr/include/c++/7/bits/std_function.h", line 316, in operator()
| 314: _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
| 315: {
| > 316: (*_Base::_M_get_pointer(__functor))(
| 317: std::forward<_ArgTypes>(__args)...);
| 318: }
Source "src/io/register_loader_saver_gcsa.cpp", line 31, in _M_invoke [0x55b847dc5bf6]
28: }, [](const void* index_void, ostream& output) {
29: // Cast to GCSA and serialize to the stream.
30: assert(index_void != nullptr);
> 31: ((const gcsa::GCSA*) index_void)->serialize(output);
32: });
33: }
#2 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847f26766, in gcsa::GCSA::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const
Source "/z/home/anovak/workspace/vg/deps/gcsa2/gcsa.cpp", line 150, in serialize [0x55b847f26766]
148: for(size_type comp = 0; comp < this->alpha.sigma; comp++)
149: {
> 150: written_bytes += this->fast_bwt[comp].serialize(out, child, "fast_bwt");
151: }
152: for(size_type comp = 0; comp < this->alpha.sigma; comp++)
153: {
#1 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847f3b147, in sdsl::bit_vector_il<512u>::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const
Source "/z/home/anovak/workspace/vg/include/sdsl/bit_vector_il.hpp", line 209, in serialize [0x55b847f3b147]
206: written_bytes += write_member(m_block_num, out, child, "block_num");
207: written_bytes += write_member(m_superblocks, out, child, "superblocks");
208: written_bytes += write_member(m_block_shift, out, child, "block_shift");
> 209: written_bytes += m_data.serialize(out, child, "data");
210: written_bytes += m_rank_samples.serialize(out, child, "rank_samples");
211: structure_tree::add_size(child, written_bytes);
212: return written_bytes;
#0 Object "/z/home/anovak/workspace/vg/bin/vg", at 0x55b847393106, in sdsl::int_vector<(unsigned char)64>::serialize(std::ostream&, sdsl::structure_tree_node*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) const
Source "/z/home/anovak/workspace/vg/include/sdsl/int_vector.hpp", line 1570, in serialize [0x55b847393106]
1567: {
1568: structure_tree_node* child = structure_tree::add_child(v, name, util::class_name(*this));
1569: size_type written_bytes = 0;
>1570: if (t_width > 0 and write_fixed_as_variable) {
1571: written_bytes += int_vector<0>::write_header(m_size, t_width, out);
1572: } else {
1573: written_bytes += int_vector<t_width>::write_header(m_size, m_width, out);
Something is wrong on vg side. When I extract kmers from the empty graph with vg kmers -g -B
and build GCSA using build_gcsa
, everything works correctly. If I try building GCSA from the kmers with vg index
using the -i
option, I get the same crash.
If the line numbers in the stack trace are correct, the crash occurs when evaluating the conditions in an if
statement. t_width
is a template parameter with value (unsigned char)64
, so the first part is true, while write_fixed_as_variable
is false.
Constructing a
gcsa::GCSA
with the default constructor, and then immediately callingserialize()
to serialize the (empty) index to anostream
, results in a segfault.If it is forbidden to serialize a default-constructed
gcsa::GCSA
, that needs to be documented in a doc comment forserialize()
. Otherwise, the serialization needs to be fixed to work.