vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Support arbitrarily deeply nested unary snarls by eliminating real recursion #1253

Open cjprybol opened 6 years ago

cjprybol commented 6 years ago

I ran through the example of creating a graph of two E. coli genomes on the wiki and everything has worked great. The resulting k12_and_o157.vg has ~350K edges and ~350K nodes but I've been unable to find a way to visualize it. GraphViz doesn't seem to support enough character encodings to render the graph using dot format, although Bandage will load and render the graph in gfa format. Unfortunately, the rendered graph is too large to fit within the Bandage viewing window and crashes Bandage when trying to save it as an svg. I tried using vg simplify to reduce the complexity of the graph to try rendering it again, but encountered the bus error. I read that bus errors are associated with being unable to access a location in memory. I tried upping the memory of my session as high as 64Gb and I'm still getting the error. Any idea if it's a bug or if I'm running out of memory at 64Gb?

To try and reproduce, you can either follow the example in the wiki to generate the graph or download the graph I generated here, and then run vg simplify k12_and_o157.vg > k12_and_o157.simple.vg or anything along those lines. I got the same error irrespective of whether I used the --progress flag or changed the number of threads. I saw the error while using a ~3 week old master as well as on a version of master I pulled this morning. If it's helpful at all, here is info on the system

lsb_release -a

LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.9 (Final)
Release:    6.9
Codename:   Final

cat /proc/version

Linux version 2.6.32-696.3.2.el6.x86_64 (mockbuild@c1bl.rdu2.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Tue Jun 20 01:26:55 UTC 2017

Thanks!

adamnovak commented 6 years ago

This is probably my fault (since I wrote vg simplify), but I'm not sure if/when I will be able to take on debugging it.

If you have gdb, you could do:

gdb vg
run simplify k12_and_o157.vg > k12_and_o157.simple.vg
# Wait for it to crash
bt

That would produce a stack trace that should show where in the simplify code the mistake is. If not, I will do it when I get around to this issue.

cjprybol commented 6 years ago

Thanks for the instructions on how to use gdb! I've never used that before.

After running gdb vg I ran set logging on before running run simplify k12_and_o157.vg > k12_and_o157.simple.vg to generate this log file of the stack trace

but I'm not sure if/when I will be able to take on debugging it

No worries, I'll see if I can make progress on generating something for the panvirus wiki examples in the meantime

adamnovak commented 6 years ago

It looks like what you've encountered is a stack overflow in my snarl translation code; it just keeps calling itself. This is almost certainly my fault somehow (unless we're seeing really, really big unary snarls).

On Mon, Nov 20, 2017 at 4:23 PM, Cameron Prybol notifications@github.com wrote:

Thanks for the instructions on how to use gdb! I've never used that before.

After running gdb vg I ran set logging on before running run simplify k12_and_o157.vg > k12_and_o157.simple.vg to generate this log file of the stack trace https://stanfordmedicine.box.com/s/718y1e08i6zqn0mi7asgeyh8s3ucrhcq

but I'm not sure if/when I will be able to take on debugging it

No worries, I'll see if I can make progress on generating something for the panvirus wiki examples https://github.com/vgteam/vg/wiki/A-pangenomics-example-with-vg-(viruses) in the meantime

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/1253#issuecomment-345875612, or mute the thread https://github.com/notifications/unsubscribe-auth/AE0_X69VpmeLfqTc6Xx8K85APCHm4J5Fks5s4hf7gaJpZM4Qfy8S .

adamnovak commented 6 years ago

OK, it looks like actually we are seeing really big, deep nested unary snarls:

(gdb) frame 8
#8  0x0000000000918b4c in vg::CactusSnarlFinder::recursively_emit_snarls (this=this@entry=0x237b8c80, start=..., end=..., 
    parent_start=..., parent_end=..., chains_list=0x35ec6cc0, unary_snarls_list=0x35ec6ce0, destination=...) at src/genotypekit.cpp:619
619             child_snarl->chains, child_snarl->unarySnarls, destination));
(gdb) p start
$2 = (const vg::Visit &) @0x7fffff400400: {<google::protobuf::Message> = {<google::protobuf::MessageLite> = {
      _vptr.MessageLite = 0xff1450 <vtable for vg::Visit+16>}, <No data fields>}, static kIndexInFileMessages = 16, 
  static kSnarlFieldNumber = 2, static kNodeIdFieldNumber = 1, static kBackwardFieldNumber = 3, 
  _internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0, static kPtrTagMask = <optimized out>, 
      static kPtrValueMask = <optimized out>}, <No data fields>}, snarl_ = 0x0, node_id_ = 37562, backward_ = true, _cached_size_ = 0}
(gdb) p end
$3 = (const vg::Visit &) @0x7fffff400470: {<google::protobuf::Message> = {<google::protobuf::MessageLite> = {
      _vptr.MessageLite = 0xff1450 <vtable for vg::Visit+16>}, <No data fields>}, static kIndexInFileMessages = 16, 
  static kSnarlFieldNumber = 2, static kNodeIdFieldNumber = 1, static kBackwardFieldNumber = 3, 
  _internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0, static kPtrTagMask = <optimized out>, 
      static kPtrValueMask = <optimized out>}, <No data fields>}, snarl_ = 0x0, node_id_ = 37562, backward_ = false, _cached_size_ = 0}
(gdb) frame 9
#9  0x0000000000918b4c in vg::CactusSnarlFinder::recursively_emit_snarls (this=this@entry=0x237b8c80, start=..., end=..., 
    parent_start=..., parent_end=..., chains_list=0x35ec6c60, unary_snarls_list=0x35ec6c80, destination=...) at src/genotypekit.cpp:619
619             child_snarl->chains, child_snarl->unarySnarls, destination));
(gdb) p start
$4 = (const vg::Visit &) @0x7fffff4006b0: {<google::protobuf::Message> = {<google::protobuf::MessageLite> = {
      _vptr.MessageLite = 0xff1450 <vtable for vg::Visit+16>}, <No data fields>}, static kIndexInFileMessages = 16, 
  static kSnarlFieldNumber = 2, static kNodeIdFieldNumber = 1, static kBackwardFieldNumber = 3, 
  _internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0, static kPtrTagMask = <optimized out>, 
      static kPtrValueMask = <optimized out>}, <No data fields>}, snarl_ = 0x0, node_id_ = 37563, backward_ = true, _cached_size_ = 0}
(gdb) p end
$5 = (const vg::Visit &) @0x7fffff400720: {<google::protobuf::Message> = {<google::protobuf::MessageLite> = {
      _vptr.MessageLite = 0xff1450 <vtable for vg::Visit+16>}, <No data fields>}, static kIndexInFileMessages = 16, 
  static kSnarlFieldNumber = 2, static kNodeIdFieldNumber = 1, static kBackwardFieldNumber = 3, 
  _internal_metadata_ = {<google::protobuf::internal::InternalMetadataWithArenaBase<google::protobuf::UnknownFieldSet, google::protobuf::internal::InternalMetadataWithArena>> = {ptr_ = 0x0, static kPtrTagMask = <optimized out>, 
      static kPtrValueMask = <optimized out>}, <No data fields>}, snarl_ = 0x0, node_id_ = 37563, backward_ = false, _cached_size_ = 0}
(gdb) 

(See how adjacent node IDs 37563 and 37562 bound successive unary snarls?)

Basically, the problem is that the graph you are working on looks like two or more chromosomes connected together. We try to decompose it into sites, but (unless we're picking bad telomeres to anchor the decomposition) even the best decomposition has long dangling tips that aren't on the path between the chosen end nodes.

We end up with a decomposition in which each variable site in one of those dangling graph regions is nested inside the previous one, making a really deep stick-shaped tree. Then we try to recurse over it ans have a stack overflow because the stick is longer than our machine's stack.

I think the immediate fix is to rewrite the recursive algorithm to not really be recursive, but the code is really much nicer as recursive code and I wish we had a good way to limit the nesting depth instead. We might also be picking really bad telomeres for the decomposition, which we should maybe look into, but in general some graphs will do this no matter what telomeres you pick.

ekg commented 6 years ago

@cjprybol for what it's worth I've also been unable to visualize very large graphs. I find that Bandage will do the trick but tend to crash when it attempts to save the .svg. I've reported this https://github.com/rrwick/Bandage/issues/56.

cjprybol commented 6 years ago

Thanks for letting me know! I'll subscribe to that issue to track progress

adamnovak commented 6 years ago

This problem is also affecting Shilpa, and I think is part of the problem in #1653.