vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Strange snarls in cyclic tips #2260

Open maickrau opened 5 years ago

maickrau commented 5 years ago

Please describe:

  1. What you were trying to do

I'm trying to find bubbles in an assembly graph by running vg snarls (version 1.15.0). There are some snarls that I don't understand. Here's a picture of the region where with the strange snarls

snarl

  1. What you wanted to happen

vg snarls reports an ultrabubble starting from node 392 and ending at 2496, and an another starting from node 2496 and ending at 2786

  1. What actually happened

vg snarls reports a unary snarl starting and ending at 2496, and an unclassified snarl starting at 2497 and ending at 2498. Similar snarls are also reported for the other nodes in this region

  1. What data and command line to use to make the problem recur, if applicable

topology-graph.vg.gz

$ vg snarls topology-graph.vg > snarls.pb
$ vg view -R snarls.pb > snarls.json
$ grep 2496 < snarls.json
{"type":"UNARY","start":{"node_id":"2496"},"end":{"node_id":"2496","backward":true},"parent":{"start":{"node_id":"393"},"end":{"node_id":"394","backward":true}},"start_self_reachable":true,"end_self_reachable":true,"start_end_reachable":true,"directed_acyclic_net_graph":true}
{"start":{"node_id":"2497"},"end":{"node_id":"2498","backward":true},"parent":{"start":{"node_id":"2496"},"end":{"node_id":"2496","backward":true}},"start_self_reachable":true,"end_self_reachable":true,"start_end_reachable":true,"directed_acyclic_net_graph":true}
glennhickey commented 5 years ago

Not sure I'm following, but it seems you want a 392-246 snarl, but vg's giving you a 393-394 snarl instead? The snarl decomposition isn't unique, so it's hard to guarantee the one you expect.

The "root snarl" dictates the decomposition you do get. There are heuristics in vg to try to place root snarls at pairs of chromosome telomeres. This is generally easy on graphs made with vg construct, as there will be a reference path that begins and ends on "stub nodes" with degree 0.

If you have an orientation in mind for this graph, I think you'd need to somehow represent that as a path between two telomere nodes, and hopefully vg will give you the decomposition you want.

maickrau commented 5 years ago

Not sure I'm following, but it seems you want a 392-246 snarl, but vg's giving you a 393-394 snarl instead?

That's correct.

The "root snarl" dictates the decomposition you do get. There are heuristics in vg to try to place root snarls at pairs of chromosome telomeres. This is generally easy on graphs made with vg construct, as there will be a reference path that begins and ends on "stub nodes" with degree 0.

This graph doesn't have any obvious telomeres which could be used for rooting, as it's an assembly graph (not from vg) with multiple chromosomes in the same connected component with a tangle in the middle. My use case is that I want to find ultrabubbles for repeat separation and haplotype phasing, and I don't need other kinds of snarls. Is there an easy way to use vg snarls for this?