vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

vg circularize #281

Closed edawson closed 8 years ago

edawson commented 8 years ago

I think it'd be nice to have a single command to circularize a graph by creating an edge between its head and tail nodes. Right now I do this manually, taking the graph through GFA, but it turns out I'm doing it more often than I initially expected.

I'll work on this tonight; it's probably all of five lines of code.

ekg commented 8 years ago

Need to take from the command line the head and tail nodes to join. The problem is that we can't always do this automatically; we don't necessarily want to link all the head and tails nodes.

Another related problem is indicating that a specific alignment is circular. This would be a flag on the path.

On Wed, Mar 30, 2016 at 4:07 PM Eric T. Dawson notifications@github.com wrote:

I think it'd be nice to have a single command to circularize a graph by creating an edge between its head and tail nodes. Right now I do this manually, taking the graph through GFA, but it turns out I'm doing it more often than I initially expected.

I'll work on this tonight; it's probably all of five lines of code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/vgteam/vg/issues/281

edawson commented 8 years ago

So I have a thought or two on this. As @ekg handily pointed out, we may have some paths in a graph which are circular and some that are not. One example I can imagine of this is if we place an HPV decoy in a human graph. It is therefore important to circularize only specific paths.

I could solve this by taking in a file that looks like this: ChromA circular ChromB circular ChromC linear

but it's a bit ugly and I want some opinions on it. In the mean time, I'll try generating a circular pan-HPV graph using MSGA to see if it's feasible/useful.

ekg commented 8 years ago

Maybe it would be simpler to have a vg mod command that takes a list of paths in a file and ensures that they are circular in the graph. So this would mean setting a flag in the path object to say if it is circular or not and also adding some edges to allow the circular path to be fully embedded in the graph.

For msga we could take an argument which lists the circular paths by name. This would be ideal as it makes the assembly circular genome aware. Otherwise we will get divergent alignments at the starts and ends of the sequences.

So I guess vg::VG should have a function which circularizes an embedded path and ensures that we can traverse all parts of it in the graph by linking the head and tail positions with an edge if such does not exist already.

ekg commented 8 years ago

And also vg.proto should be patched to have a is_circular boolean flag on the Path object.

ekg commented 8 years ago

I've got some more extensions to this.

  1. Visualization--- I want to look at circular paths.
  2. Path::is_circular (vg.proto)--- we need to be able to serialize graphs and remember that the paths are circular.

I'm on it!

ekg commented 8 years ago

screenshot from 2016-04-18 00-22-04

ekg commented 8 years ago

Next extension: MSGA should handle circular paths.