rrwick / Bandage

a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
http://rrwick.github.io/Bandage/
GNU General Public License v3.0
579 stars 96 forks source link

Display the CIGAR string of an edge [feature request] #24

Open sjackman opened 8 years ago

sjackman commented 8 years ago

Also other properties, but I'm most interested in the CIGAR string to determine the amount by which two sequences overlap. It would also but neat to visualize the amount of overlap, but I'd be happy with a text display for now.

rrwick commented 8 years ago

That wouldn't be too hard, but I should reiterate here that Bandage still doesn't support complex overlaps (with insertions/deletions). Even mismatches aren't really supported - if you extracted a path sequence through an overlap with a mismatch, the resulting sequence will always have the base from the earlier node in the path.

In the almost-finished release of Bandage (v0.8.0, now on the master branch), I did add a graph information dialog window which, among other things, shows the overlap range in the graph. So if that graph you're using has only one overlap size (as is the case for a SPAdes graph), then this will tell your answer. But if your graph has multiple different overlap sizes, then it won't help.

Can I ask: do your graphs have varying overlap sizes? And do they have (eek!) complex overlaps with mismatches and indels?

sjackman commented 8 years ago

Can I ask: do your graphs have varying overlap sizes?

Yes, ABySS has varying overlap sizes.

And do they have (eek!) complex overlaps with mismatches and indels?

No, but this is typical of a GFA file output by @lh3's nanoasm.

sjackman commented 8 years ago

I'm looking forward to the release of 0.8.0! Bandage has been very useful for me.

SHuang-Broad commented 8 years ago

+1 for this feature request.

The case that I find myself in need of this feature is when looking at the larger variants, e.g. when a heterozygous SNP/small indel (true or not) site creates a bubble or tail, but the bubble/tail actually creates a noise in the graph.

It would be nice to be able to crush a bubble, or cut a tail if the mismatch is less than a certain value.

rrwick commented 8 years ago

Steve,

I am planning on revamping the Bandage labels, including adding stuff like edge labels, so this feature is coming - I promise! Hopefully not too far in the future... 😄

Regarding your workflow for removing bubbles/tails, there is a way to manually do this. Bandage has some graph-editing functionality, though it's not particularly fleshed out.

If you have a simple bubble like this: bubble_1

You can select the node you want to delete: bubble_2

And then use Edit + Remove selection from graph (or shift-delete): bubble_3

Then to simplify the graph you can use Edit + Merge all possible nodes: bubble_4

And finally you might want to redraw the graph to make it look nice: bubble_5

The Output menu then has options for saving your modified graph file to GFA. Be careful when editing, as there's no undo! (That would not be trivial to implement...)

This process will also work for getting rid of loose ends in the graph: loose_end_1 loose_end_2 loose_end_3 loose_end_4

This process is manual and somewhat labour-intensive, and I'm not sure how many bubbles/tails you need to clean up. An automated way of doing this would be possible, but it would of course need a lot of parameters: how to choose which bubble path to keep, max size to simplify, etc. If you're interested in an automated approach, I'd be curious for your thoughts on this. What automated logic would be most useful for graph simplification?

sjackman commented 8 years ago

For bubble popping ABySS has the PopBubbles program at https://github.com/bcgsc/abyss/blob/master/PopBubbles/PopBubbles.cpp For tip trimming, ABySS has the abyss-filtergraph program at https://github.com/bcgsc/abyss/blob/master/FilterGraph/FilterGraph.cc

Both come with ABySS, and you can install them easily on Mac or Linux using Homebrew or Linuxbrew.

Both support GFA and Graphviz (.gv or .dot) format graphs.