vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg gamsort crashed #2501

Open verne91 opened 4 years ago

verne91 commented 4 years ago

I am trying to sort and index the .gam file. The command line I used is

vg gamsort -p -t 16 x.gam -i x.gam.sorted.gai > x.sorted.gam

I got an error after running 17 hours.

break into sorted chunks       [=======================]100.0%
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_K1Yyjw/stacktrace.txt

The stacktrace.txt is

Crash report for vg v1.19.0 "Tramutola"
Stack trace (most recent call last):
#9    Object "/home/cnsun/software/bin/vg", at 0x4d6a79, in _start
#8    Object "/home/cnsun/software/bin/vg", at 0x1ac8318, in __libc_start_main
#7    Object "/home/cnsun/software/bin/vg", at 0x40a7d2, in main
#6    Object "/home/cnsun/software/bin/vg", at 0x9a94e7, in vg::subcommand::Subcommand::operator()(int, char**) const
#5    Object "/home/cnsun/software/bin/vg", at 0x9ccbb3, in main_gamsort(int, char**)
#4    Object "/home/cnsun/software/bin/vg", at 0xa455a9, in vg::get_input_file(int&, int, char**, std::function<void (std::istream&)>)
#3    Object "/home/cnsun/software/bin/vg", at 0xa4538c, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#2    Object "/home/cnsun/software/bin/vg", at 0x9cdd9d, in main_gamsort(int, char**)::{lambda(std::istream&)#1}::operator()(std::istream&) const
#1    Object "/home/cnsun/software/bin/vg", at 0x9d2ea3, in vg::StreamSorter<vg::Alignment>::stream_sort(std::istream&, std::ostream&, vg::StreamIndex<vg::Alignment>*)
#0    Object "/home/cnsun/software/bin/vg", at 0x9d2097, in vg::StreamSorter<vg::Alignment>::streaming_merge(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long> > >*)
verne91 commented 4 years ago

Any update for this issue?

cmarkello commented 4 years ago

@adamnovak I'm also running into the same issue with vg version 1.23.0 running vg gamsort -p HG002_0.gam -i HG002_0.sorted.gam.gai > HG002_0.sorted.gam

And getting the following crash report:

Crash report for vg v1.23.0-57-g75e7c49fd "Lavello"
Stack trace (most recent call last):
#14   Object "/vg/bin/vg", at 0x4dd319, in _start
#13   Object "/vg/bin/vg", at 0x1bc5af8, in __libc_start_main
#12   Object "/vg/bin/vg", at 0x40b007, in main
#11   Object "/vg/bin/vg", at 0x9d6e57, in vg::subcommand::Subcommand::operator()(int, char**) const
#10   Object "/vg/bin/vg", at 0xa4db06, in main_validate(int, char**)
#9    Object "/vg/bin/vg", at 0xa78a82, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#8    Object "/vg/bin/vg", at 0xa4e667, in std::_Function_handler<void (std::istream&), main_validate(int, char**)::{lambda(std::istream&)#1}>::_M_invoke(std::_Any_data const&, std::istream&)
#7    Object "/vg/bin/vg", at 0x935744, in void vg::io::for_each<vg::Alignment>(std::istream&, std::function<void (long, vg::Alignment&)> const&)
#6    Object "/vg/bin/vg", at 0x5c48e3, in vg::io::ProtobufIterator<vg::Alignment>::fill_value()
#5    Object "/vg/bin/vg", at 0x1b0dd53, in __cxa_throw
#4    Object "/vg/bin/vg", at 0x1b0fdb0, in std::terminate()
#3    Object "/vg/bin/vg", at 0x1b0fd65, in __cxxabiv1::__terminate(void (*)())
#2    Object "/vg/bin/vg", at 0x1ba8144, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/vg/bin/vg", at 0x1bd63c0, in abort
#0    Object "/vg/bin/vg", at 0x1189767, in raise

Input files for reproducing this error can be found in the following google drive: https://drive.google.com/open?id=1lqeo8NlOt1ei8NtilrTtvmZMMy46vV8G

adamnovak commented 4 years ago

@cmarkello I think you might have a different issue. Yours is an exception being thrown. Did it say anything about signal 11 (segfault) for you?

What's the exception complaining about? There should be some lines before the stack trace with the exception message.

cmarkello commented 4 years ago

@adamnovak I've sent you the complete toil worker output log on slack. The stack trace is similar but a little different.

terminate called without an active exception
Crash report for vg v1.23.0-57-g75e7c49fd "Lavello"
Stack trace (most recent call last):
#15   Object "/vg/bin/vg", at 0x4dd319, in _start
#14   Object "/vg/bin/vg", at 0x1bc5af8, in __libc_start_main
#13   Object "/vg/bin/vg", at 0x40b007, in main
#12   Object "/vg/bin/vg", at 0x9d6e57, in vg::subcommand::Subcommand::operator()(int, char**) const
#11   Object "/vg/bin/vg", at 0xa104f3, in main_gamsort(int, char**)
#10   Object "/vg/bin/vg", at 0xa78b99, in vg::get_input_file(int&, int, char**, std::function<void (std::istream&)>)
#9    Object "/vg/bin/vg", at 0xa78a82, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<
void (std::istream&)>)
#8    Object "/vg/bin/vg", at 0xa114bd, in main_gamsort(int, char**)::{lambda(std::istream&)#1}::operator()(std::istream&) const
#7    Object "/vg/bin/vg", at 0xa16763, in vg::StreamSorter<vg::Alignment>::stream_sort(std::istream&, std::ostream&, vg::StreamIndex<vg::Alignment>*)
#6    Object "/vg/bin/vg", at 0x1baba0e, in GOMP_parallel
#5    Object "/vg/bin/vg", at 0xa1871c, in vg::StreamSorter<vg::Alignment>::stream_sort(std::istream&, std::ostream&, vg::StreamIndex<vg::Alignment>*) [clone ._omp_
fn.2]
#4    Object "/vg/bin/vg", at 0x1b0fdb0, in std::terminate()
#3    Object "/vg/bin/vg", at 0x1b0fd65, in __cxxabiv1::__terminate(void (*)())
#2    Object "/vg/bin/vg", at 0x1ba8144, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/vg/bin/vg", at 0x1bd63c0, in abort
#0    Object "/vg/bin/vg", at 0x1189767, in raise
cmarkello commented 4 years ago

@adamnovak When shelling into the vg container and running it locally, it gives a Signal 6.

Singularity> vg gamsort -p HG002_0.gam -i HG002_0.sorted.gam.gai > HG002_0.sorted.gam
terminate called without an active exception                                                                                                            ]  0.0%
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_pA94rz/stacktrace.txt
Please include the stack trace file in your bug report!
Singularity> vg bugs --new
sh: 1: xdg-open: not found
Singularity> cat /tmp/vg_crash_pA94rz/stacktrace.txt
Crash report for vg v1.23.0-57-g75e7c49fd "Lavello"
Stack trace (most recent call last):
#15   Object "/vg/bin/vg", at 0x4dd319, in _start
#14   Object "/vg/bin/vg", at 0x1bc5af8, in __libc_start_main
#13   Object "/vg/bin/vg", at 0x40b007, in main
#12   Object "/vg/bin/vg", at 0x9d6e57, in vg::subcommand::Subcommand::operator()(int, char**) const
#11   Object "/vg/bin/vg", at 0xa104f3, in main_gamsort(int, char**)
#10   Object "/vg/bin/vg", at 0xa78b99, in vg::get_input_file(int&, int, char**, std::function<void (std::istream&)>)
#9    Object "/vg/bin/vg", at 0xa78a82, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#8    Object "/vg/bin/vg", at 0xa114bd, in main_gamsort(int, char**)::{lambda(std::istream&)#1}::operator()(std::istream&) const
#7    Object "/vg/bin/vg", at 0xa16763, in vg::StreamSorter<vg::Alignment>::stream_sort(std::istream&, std::ostream&, vg::StreamIndex<vg::Alignment>*)
#6    Object "/vg/bin/vg", at 0x1baba0e, in GOMP_parallel
#5    Object "/vg/bin/vg", at 0xa1871c, in vg::StreamSorter<vg::Alignment>::stream_sort(std::istream&, std::ostream&, vg::StreamIndex<vg::Alignment>*) [clone ._omp_fn.2]
#4    Object "/vg/bin/vg", at 0x1b0fdb0, in std::terminate()
#3    Object "/vg/bin/vg", at 0x1b0fd65, in __cxxabiv1::__terminate(void (*)())
#2    Object "/vg/bin/vg", at 0x1ba8144, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/vg/bin/vg", at 0x1bd63c0, in abort
#0    Object "/vg/bin/vg", at 0x1189767, in raise
Singularity>
adamnovak commented 4 years ago

terminate called without an active exception is definitely mystifying and bad. I can try and take a look at this tomorrow and see if I can replicate it.

adamnovak commented 4 years ago

The next thing to be done is to run your data files under gdb, with a build of vg that hasn't had all the line number debug info pulled out, to try and catch it segfaulting or throwing or otherwise complaining.

I've put in a request for access on your Google drive link, but from the filenames it looks like this is going to be hundred-plus-gigabyte GAM files and whole-genome graphs, which probably don't want to come to my house via my web browser. Can you drop the files on Courtyard somewhere? Or recommend a way to get them from Google Drive to Courtyard without visiting my house? How did you get them up there, if they're huge? Or are they not actually huge?

cmarkello commented 4 years ago

They're not large at all. They're just my tiny ABO locus test dataset.

adamnovak commented 4 years ago

OK, I'm not sure why vg is crashing so hard that it forgets about its own exceptions, but your real problem is that your GAM file is actually a BAM file pretending to be a GAM file:

samtools flagstat vg_debug_input/HG002_0.gam
5845 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
5843 + 0 mapped (99.97% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

You can see this if you inspect the uncompressed data:

zcat vg_debug_input/HG002_0.gam | xxd - | head -n5
00000000: 4241 4d01 3e00 0000 4048 4409 564e 3a31  BAM.>...@HD.VN:1
00000010: 2e35 0953 4f3a 756e 6b6e 6f77 6e0a 4053  .5.SO:unknown.@S
00000020: 5109 534e 3a41 424f 6c6f 6375 7309 4c4e  Q.SN:ABOlocus.LN
00000030: 3a35 3030 3030 0a40 5047 0949 443a 3009  :50000.@PG.ID:0.
00000040: 504e 3a76 670a 0100 0000 0900 0000 4142  PN:vg.........AB

GAM data doesn't start with BAM.

On my machine, when I try and gamsort this BAM file, it crashes like this:

vg gamsort -p vg_debug_input/HG002_0.gam -i vg_debug_input/HG002_0.sorted.gam.gai > vg_debug_input/HG002_0.sorted.gam
terminate called after throwing an instance of 'std::runtime_error'
  what():  [io::ProtobufIterator] could not parse message
Crash report for vg v1.23.0-247-gd1d4d2d3f "Lavello"
Stack trace (most recent call last) in thread 1097:
...

So it is clearly (to me) complaining about the file not being what it expected.

A real fix for this might be defining a BadGamError exception, catching it, and printing a message more along the lines of "Your GAM is not good" without a stack trace. But the workaround is to actually feed in GAM files.

cmarkello commented 4 years ago

That makes sense. This was during a run of toil-vg that was set to directly output BAM files from vg map. Then there was some post processing that ran vg gamsort prior to GAM merging, where it was expecting GAM files.