ratschlab / metagraph

Scalable annotated de Bruijn graphs for DNA indexing, alignment, and assembly
http://metagraph.ethz.ch
GNU General Public License v3.0
110 stars 17 forks source link

Metagraph for pangenome analysis #410

Closed Glfrey closed 2 years ago

Glfrey commented 2 years ago

This isn't an issue per se, but rather a theoretical question. Could metagraph be used for a non-read based pangenome analysis? II understand it would function beautifully for a read-based analysis, but if an abundance of beautifully assembled genomes should present themselves for pan-genome analysis (wouldn't that be nice), could metagraph lend itself to such an application without needing to resort back to reads? Originally I thought so until I remembered that it operates using k-mers instead of the usual non-overlapping chunks I see used for pan genomes. Would it be possible to overcome this?

I see metagraph being mentioned in quite a few recent pan genome papers so I think I'm probably not the only one wondering this.

ratsch commented 2 years ago

Here are my two cents: you can use assembled whole genomes as an alternative to reads. In fact we have done this by indexing refseq. Of course, the index is still k-mer based and there is an intrinsic loss of information from a long sequence to a k-mer based representation, unless you store additional information. We have recently developed efficient datastructures to keep positional information in top of the de bruin graph (counting de bruin graphs). The position information allows you to losslessly encode the input sequence, including reference sequences.
Hope this helps. G — Gunnar Rätsch http://bioweb.me/gr-contact

On 4 Jul 2022, at 13:03, Gillian Reynolds @.***> wrote:

This isn't an issue per se, but rather a theoretical question. Could metagraph be used for a non-read based pangenome analysis? II understand it would function beautifully for a read-based analysis, but if an abundance of beautifully assembled genomes should present themselves for pan-genome analysis (wouldn't that be nice), could metagraph lend itself to such an application without needing to resort back to reads? Originally I thought so until I remembered that it operates using k-mers instead of the usual non-overlapping chunks I see used for pan genomes. Would it be possible to overcome this?

I see metagraph being mentioned in quite a few recent pan genome papers so I think I'm probably not the only one wondering this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

Glfrey commented 2 years ago

Brilliant, thank you. Metagraph is truly a game changing tool and I'm very grateful for the responsiveness to my (never-ending) queries. I'll close this as it's not really an issue but is it worth posting this information in the GitHub readme so others can see?

karasikov commented 2 years ago

Thanks for your suggestion. I added a line about this to the readme.