Open sjackman opened 8 years ago
That wouldn't be too hard, but I should reiterate here that Bandage still doesn't support complex overlaps (with insertions/deletions). Even mismatches aren't really supported - if you extracted a path sequence through an overlap with a mismatch, the resulting sequence will always have the base from the earlier node in the path.
In the almost-finished release of Bandage (v0.8.0, now on the master branch), I did add a graph information dialog window which, among other things, shows the overlap range in the graph. So if that graph you're using has only one overlap size (as is the case for a SPAdes graph), then this will tell your answer. But if your graph has multiple different overlap sizes, then it won't help.
Can I ask: do your graphs have varying overlap sizes? And do they have (eek!) complex overlaps with mismatches and indels?
Can I ask: do your graphs have varying overlap sizes?
Yes, ABySS has varying overlap sizes.
And do they have (eek!) complex overlaps with mismatches and indels?
No, but this is typical of a GFA file output by @lh3's nanoasm
.
I'm looking forward to the release of 0.8.0! Bandage has been very useful for me.
+1 for this feature request.
The case that I find myself in need of this feature is when looking at the larger variants, e.g. when a heterozygous SNP/small indel (true or not) site creates a bubble or tail, but the bubble/tail actually creates a noise in the graph.
It would be nice to be able to crush a bubble, or cut a tail if the mismatch is less than a certain value.
Steve,
I am planning on revamping the Bandage labels, including adding stuff like edge labels, so this feature is coming - I promise! Hopefully not too far in the future... 😄
Regarding your workflow for removing bubbles/tails, there is a way to manually do this. Bandage has some graph-editing functionality, though it's not particularly fleshed out.
If you have a simple bubble like this:
You can select the node you want to delete:
And then use Edit + Remove selection from graph (or shift-delete):
Then to simplify the graph you can use Edit + Merge all possible nodes:
And finally you might want to redraw the graph to make it look nice:
The Output menu then has options for saving your modified graph file to GFA. Be careful when editing, as there's no undo! (That would not be trivial to implement...)
This process will also work for getting rid of loose ends in the graph:
This process is manual and somewhat labour-intensive, and I'm not sure how many bubbles/tails you need to clean up. An automated way of doing this would be possible, but it would of course need a lot of parameters: how to choose which bubble path to keep, max size to simplify, etc. If you're interested in an automated approach, I'd be curious for your thoughts on this. What automated logic would be most useful for graph simplification?
For bubble popping ABySS has the PopBubbles
program at https://github.com/bcgsc/abyss/blob/master/PopBubbles/PopBubbles.cpp
For tip trimming, ABySS has the abyss-filtergraph
program at https://github.com/bcgsc/abyss/blob/master/FilterGraph/FilterGraph.cc
Both come with ABySS, and you can install them easily on Mac or Linux using Homebrew or Linuxbrew.
Both support GFA and Graphviz (.gv or .dot) format graphs.
Also other properties, but I'm most interested in the CIGAR string to determine the amount by which two sequences overlap. It would also but neat to visualize the amount of overlap, but I'd be happy with a text display for now.