vgteam / sequenceTubeMap

displays multiple genomic sequences in the form of a tube map
MIT License
180 stars 25 forks source link

Detect when nodes are visited predominantly in reverse, and display them backward, instead of making loads of inversion paths #132

Open SimonaSecomandi opened 3 years ago

SimonaSecomandi commented 3 years ago

Dear all,

In the image I’ve attached, there are some SNPs which are confusing to me (the “convoluted” ones).

The paths seems to indicate that those are inversions. Indeed, some paths enters the SNPs in one direction, and others in the opposite direction. However, there is only a base involved!

What is the purpose of that kind of representation?

Many thanks,

Simona

example
adamnovak commented 3 years ago

Hello, @SimonaSecomandi! Sorry to hear that the visualization looks confusing to you here.

It's pretty confusing to me as well; my best guess is that you have some nodes here that are predominantly visited in reverse. For example, if you look at that lower T in the first bubble, it looks like everything going through it comes from the longer node to the left, goes past it, loops around, goes through it backward, loops around again, and continues on to the node on the right.

It's possible for a single base to be inverted because the graphs we take in articulate each node as having one orientation locally forward and the other locally reverse. We always produce our layouts with each node's local forward orientation running left to right in the image, and then we figure out where to put the nodes and paths relative to each other.

We didn't really anticipate graphs like this; we designed the tube map for graphs that have had their nodes oriented to keep the number of direction changes relatively small. This isn't true of the output of some more popular tools now, though; I've lately seen a lot of graphs where one allele of almost every SNP is backward relative to everything else.

One option is to run the graph through a tool that will flip nodes' strands to reduce the number of times paths change strands like this; vg mod -O might be one option.

Another solution would be to adjust the tube map so that it can more coherently represent a run of paths that all move together from the forward strand to the reverse strand. For example, the last 3 paths on the long node at the lower left all then visit that lower T in reverse, but they are drawn as breaking apart and taking different routes to get there, when really one would expect them to move together. The differences in the routes they take just serve to confuse people and carry no information.

And a third possible option, also changing the visualization, would be to let the tube map flip nodes around so that their local forward strand runs in whichever direction is most convenient for the layout.

SimonaSecomandi commented 3 years ago

Hi @adamnovak , many thanks for the support!

The strange SNPs were fixed byvg mod -O (see before and after images)!! Do you think this is the best way to do it? What exactly does the command? Will this affect the other variants like "real" inversions?

Many thanks!

BEFORE: BEFOREArtboard 1

AFTER: AFTERArtboard 1

adamnovak commented 3 years ago

What vg mod -O does is walk through the graph and more or less greedily reverse the local forward orientation for nodes so they agree with their neighbors, and rewrite the edges and paths appropriately so as not to change the sequence space or the path sequences.

Real inversions won't be able to be eliminated this way; they will have edges (and probably paths) connecting them to other nodes in both orientations, and so no matter which orientation is used for the node's local forward, the reverse traversal will be possible too.

The graph still ends up describing the same set of sequences, and it's easier to visualize. But it's no longer quite the same graph as before, so read alignments or mapping indexes or other annotations intended for the first graph will not be applicable to the second, and visa versa.