pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
194 stars 39 forks source link

odgi layout minigraph GFA #379

Closed cgroza closed 1 year ago

cgroza commented 2 years ago

Hi,

I am trying to 2D lay a graph generated with minigraph. I converted rGFA to GFAv1, this adds P lines for the reference and non-reference paths. However, it seems that only the reference path is properly ordered. The rest of the graph is quite scattered. Does odgi layout require large genome wide paths (minigraph creates paths only for the reference and for short non-reference segments).

Here is an example rendering of chr1 (with gfaestus). Large scale: image Zoomed: image

As you can see, we have a nicely laid out reference path, but links to non-reference segments go to some far away place.

subwaystation commented 2 years ago

This looks more like art than a pangenome graph^^

Could you please share the command lines and probably the graph you are working with? You are the first one I know who tries to go from minigraph to ODGI.

Sorry, if it doesn't work out of the box.

AndreaGuarracino commented 2 years ago

Hi @cgroza, odgi layout applies an algorithm that exploits the distances on the paths to optimize the distances of the nodes on the layout. I should take a look at your GFArized rGFA, but reading how you describe it, you might be right, that the non-reference paths are fragmented and this doesn't not allow odgi layout to work properly.

ekg commented 2 years ago

From your description of the input, it sounds like paths for non-reference nodes don't extend into the surrounding reference regions. The path-guided SGD layout needs the graph to be covered by paths that mutually overlap. My guess is that there are many edges in the graph that don't have paths overlapping them. In result, positional information can't flow across these edges and they are assigned an arbitrary position, yielding the disordered layout.

You can resolve this by adding random paths to the graph that do traverse larger regions. it'll be necessary to use a path cover. I would try out odgi cover. You might only need 5-10x coverage to get a good layout. Please let us know how it goes. If you find that it works, we can update the tutorials to describe how to support minigraph layouts.

cgroza commented 2 years ago

Thank you everyone for you suggestions. The support is always amazing.

This looks more like art than a pangenome graph^^

Could you please share the command lines and probably the graph you are working with? You are the first one I know who tries to go from minigraph to ODGI.

Sorry, if it doesn't work out of the box.

Yes, there are quite a few ways to generate graphs, and odgi is one of the best toolkits out there. I cannot share the graph (I don't own the data, not released yet), and I used minigraph as Heng Li shows in the README in the repo.

AndreaGuarracino commented 2 years ago

We don't necessarily need the graph. What are the command lines for switching from a rGFA to a GFA?

cgroza commented 2 years ago

We don't necessarily need the graph. What are the command lines for switching from a rGFA to a GFA?

I have used the vg convert functions, with rGFA as input and GFA as output.

vg convert -B -g -f -r 10000 graph.rgfa > graph.gfa
subwaystation commented 2 years ago

@cgroza How did it go?

cgroza commented 2 years ago

@subwaystation I tried it, and it seems to work in the regions where the paths do cover the graph. However, I cannot seem to tune the parameters to cover 100% of the graph. Even a few hundred/thousands randomly placed nodes leads to chaotic lines obscuring the image. I have been working with Bandage, since the minigraph nodes are quite large and approachable, especially after I split by chromosome.

subwaystation commented 2 years ago

@cgroza Did you also try the 1D sorting algorithm in odgi sort? It now allows to specifically select reference paths. See https://odgi.readthedocs.io/en/latest/rst/tutorials/sort_layout.html#visualize-the-mhc-by-path-position for an example. If that works well for your graph, I will try to enhance the 2D layout algorithm in a similar way.

Or maybe you can even share your graph? Thanks!

subwaystation commented 1 year ago

Doesn't seem to be an issue anymore. I will investigate this on my own in the next month anyhow.