pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

odgi layout - sorting and node id order #567

Closed ScottMastro closed 7 months ago

ScottMastro commented 8 months ago

https://odgi.readthedocs.io/en/latest/rst/tutorials/sort_layout.html

I am trying to use the tsv output of odgi layout to visualize a graph in 2d. The --help documentation of odgi layout says

Establish 2D layouts of the graph using path-guided stochastic gradient descent. The graph must be sorted and id-compacted.

However, the example provided uses an unsorted graph DRB1-3123_unsorted.og. So does a graph need to be sorted or not?


I have a GFA created from minigraph and what I want to do is to 1) calculate the 2d layout the graph and 2) use the tsv to assign x,y positions to each segment in the GFA. When I keep my graph unsorted, it seems the layout has problems

ex. chromsome 18 (which is also component 18 in the TSV)

image

the reference nodes are first in the gfa and the alt paths are at the end. There seems to be a big x-position jump even though all those alt nodes should be close to reference nodes with respect to the 2d space. I assume this happens because I run odgi layout without sorting.

However, when I do sort, I seem to lose the ability to connect each line of the TSV with the corresponding segment in the GFA because the order is different.


TL;DR: How do I run odgi layout so that the layout is calculated from a sorted graph but I can still assign the x,y coordinates back to the original GFA segments?

Thanks!

subwaystation commented 8 months ago

How was your graph generated? Directly with MC? Can you post your results and command lines for both the ways you tried?

ScottMastro commented 7 months ago

I think I figured this one out. The issue is that the minigraph tool outputs an rGFA. The reference path is fine and can be inferred from the rGFA information but the alternative paths are only annotated on alternative nodes. So each alt node becomes a "path".

I believe odgi layout relies on paths to calculate layout, so the reference nodes are laid out in a proper way but all the alt nodes are disjoint from a path-perspective and are put at the end. The solution is to put the complete paths into the graph before laying it out (possibly by aligning the original sequence data back to the graph).