pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
191 stars 39 forks source link

Defining reference path on graph #532

Open ManuelTgn opened 11 months ago

ManuelTgn commented 11 months ago

Hi team,

thank you for developing this amazing tool!

I have a couple of questions regarding odgi graphs manipulation through the Python API.

I created an odgi graph from a gfa containing > 1000 haplotypes + 1 reference sequence. Is there a way to understand which path belongs to the reference sequence? All paths seem to point to an haplotype.

While traversing the paths embedded in the graph, is it possible to understand to which position correspond a nucleotide in a node, using the reference coordinates space?

Thanks for the help!

subwaystation commented 11 months ago

Hi @ManuelTgn,

since I don't know in detail how you created the graphs, it is hard to tell. In ODGI's variation graph model all paths are equal, there is no sense of a reference path.

If you want to use ODGI to translate path positions to your path of choice, please take a look at https://odgi.readthedocs.io/en/latest/rst/tutorials/navigating_and_annotating_graphs.html#path-to-path-position-mapping.

ManuelTgn commented 11 months ago

Hi @subwaystation,

thank you for the reply. That was very helpful!

I created my input graph using the vg toolkit, enriching a chromosome with some variants stored in a phased VCF. Are there python functions in ODGI's API to replicate the command line functionalities showed in the documentation page you shared?

subwaystation commented 11 months ago

Hi @ManuelTgn,

I see, so no crazy variants with crazy repeats. That's good :)

As far as I know, the Python API only exposes basic ODGI functionality. You would have to build such code on your own. Take a look at https://github.com/pangenome/odgi/blob/de70fcdacb3fc06fd1d8c8d43c057a47fac0310b/src/subcommand/position_main.cpp#L794-L840 as in inspiration.

Best, Simon