pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

A new odgi function wish, odgi synteny, or a syntenywish...? #451

Open diaspj opened 2 years ago

diaspj commented 2 years ago

Hi odgi team,

First, I would like to congratulate the odgi team and other related teams (vg, pggb, etc) for the wonderful work you are doing in developing the field of Pangenome Graphs.

I came here to the issues, not to report an error but to report a wish, the development of an extra function to odgi that can handle the identification and report of synteny between organisms.

In short, my wish is a function equivalent to halSynteny (hal toolkit) to be made available in the odgi toolkit!

https://github.com/ComparativeGenomicsToolkit/hal/tree/master/synteny

halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments Ksenia Krasheninnikova, Mark Diekhans, Joel Armstrong, Aleksei Dievskii, Benedict Paten, Stephen O’Brien GigaScience, Volume 9, Issue 6, June 2020, giaa047, https://doi.org/10.1093/gigascience/giaa047

The ability of exporting a file representing genome collinearity would be a great addition to the set of operations that odgi toolkit have made available to researchers, and since the odgi toolkit was developed as a post-processing tool of Pangenome Graphs, odgi seems to be the more adequate platform to include this functionality.

For instance, if we do not construct a pangenome using pggb, but we use only wfmash and seqwish and exclude the smoothxg step, and then use odgi untangle to export a paf file representing the alignments between two genome sequences, a function odgi synteny could allow to recover the list of alignment blocks similar to the one comprised in the original paf file used to construct the gfa file, and not the long list of fragmented alignments that are reported in the paf file created by the function odgi untangle (check the two paf files in attachment).

The use of the hal toolkit is not a possibility since there has not been developed a script allowing the conversion of gfa -> hal (nor vg -> hal, og -> hal, etc), only a script in the opposite direction, hal2vg, is available...

One way around would be the development of a script that would allow to convert gfa -> hal, but because the gfa format (or og or vg) seems to be adopted by the generality of the research community as the preferred exchange format between toolkits/frameworks dedicated to the study of Pangenome Graphs, I think that going in the downstream direction makes more sense...

Best regards,

Paulo Dias

two_paf_files_the_original_and_the_exported_by_odgi_untangle.zip