vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.11k stars 195 forks source link

vg augment assertion fails on chunked GAM #1990

Open eldariont opened 5 years ago

eldariont commented 5 years ago

Hi,

vg augment crashes due to a failed assertion (https://github.com/vgteam/vg/blob/master/src/pileup_augmenter.cpp#L231) when I want to augment my cactus graph with one chunk of a big GAM file.

Command:

>/public/home/daheller/bin/vg/bin/vg augment /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/exploded/component0.vg /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/chunk_0_S288C.chrIII_20519_60881.gam -a pileup  -S /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/locus0/aug_graph.support -Z /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/locus0/aug_graph.trans -t 10 > /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/locus0/aug_graph.vg

Result:

vg: src/pileup_augmenter.cpp:231: void vg::PileupAugmenter::map_path(const vg::Path&, std::__cxx11::list<vg::mapping_t>&, bool): Assertion `expect_edits' failed.
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_3z0WUx/stacktrace.txt

I produced the chunked GAM with:

vg chunk -x /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/exploded/component0.xg -e /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/regions.slop.bed -t 10 -a /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/mappings/SRR4074383.mapped.sorted.gam -c 10 -b /public/groups/cgl/users/daheller/yeast_graph/graphs/cactus_graph_old/call_experiments/chunk

Thanks David

Edit: Updated file paths

alancleary commented 5 years ago

I've run into the same issue with a non-chunked gam file for some mapped paired-end reads. Since expect_arguments is hard coded to false, the assertion will always fail when checked. So I took a look at mapping_from_length, which is causing the assertion to be checked in the first place. It's a simple function that sums all the from_length values of a Mapping's set of Edits (https://github.com/vgteam/vg/blob/v1.13.0/src/path.cpp#L969). I replicated the function in a simple Python script that consumes a json dump of my gam file and none of the sums were 0, so this is certainly a bug.

glennhickey commented 5 years ago

@eldariont Is this still a problem (the paths in your example no longer exist so I can't check)?

@alancleary This assertion fail is coming from Path in the graph, as opposed to the GAM. For graph Paths, it is not expecting edits of any type and in this case is complaining about an insertion (from_length = 0). If you can share your input, I will check if it's a bug.

alancleary commented 5 years ago

Hey @glennhickey. Thanks for the quick response!

This assertion fail is coming from Path in the graph, as opposed to the GAM.

That makes a lot more sense. We're using our own program to convert pan-genomic de Bruijn graphs generated by tools like SplitMEM into vg graphs. It turns out when we embed sequence paths in the graph, we're not adding Edits to their Mappgings, but rather positions. Here's the relevant bit of code:

Path* path = de_bruijn.add_path();
path->set_name(label);
int rank = 1;
for (int j = 0; j < positions.size(); j++) {
  tie(pos, u) = positions[j];
  Mapping* mapping = path->add_mapping();
  mapping->set_rank(rank++);
  Position* position = mapping->mutable_position();
  position->set_node_id(u);
  position->set_offset(0);
  position->set_is_reverse(false);
}

Since the graph was built from the sequences we're embedding as paths, no edits are necessary to embed the paths. But apparently Edits are required to prevent mapping_from_length from tripping the assertion. I suspect this is a misunderstanding on my part. Can you explain to me how we might incorporate Edits into our paths? Thanks!

glennhickey commented 5 years ago

I see. Normally each Mapping will have a trivial "match" edit running the length of the node, something like

size_t node_len = <length of node u>;
Edit* edit = mapping->add_edit();
edit->set_to_length(node_len);
edit->set_from_length(node_len);

As an example of what the result would look like, see path x here:

vg construct -r tiny/tiny.fa | vg view -j -

In most cases, you can get away with not specifying any edits and having the match edits be implicit. I'll see if I can make that happen for vg call.. that assertion is related to logic that's been completely removed at this point.

alancleary commented 5 years ago

@glennhickey That did the trick. Thanks!

Not sure if OP ever got things worked out, but as far as I'm concerned the issue is resolved.

eldariont commented 5 years ago

@eldariont Is this still a problem (the paths in your example no longer exist so I can't check)?

Sorry for the late reply. Yes, the same command still throws the same error for me. I rearranged my directory structure slightly which is why the paths no longer existed. I just updated my original post with the current paths so that you can try to reproduce the problem. I'm using vg version v1.12.1-83-g9a00b102 "Parolise".