vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg index should optionally take GFA input #2520

Open ekg opened 4 years ago

ekg commented 4 years ago

I'd like to make an xg directly from a GFA, without going through .vg format.

This can be done with the xg executable. However, the resulting file is not packaged with the vg container system. This makes vg unhappy, and it also makes me sad because importing the GFA to vg format uses huge amounts of memory.

Taking GFA input in vg index should now be pretty easy to do. This is just a reminder that we should do it.

edawson commented 4 years ago

We should also fix the GFA import eating tons of RAM. I looked at this a few weeks back but got swallowed up by another project. It should probably be a higher priority.

subwaystation commented 4 years ago

@trgibbons and I observed the same issue with GFA allocating a surprisingly huge amount of memory.

edawson commented 4 years ago

It is a relatively easy fix but I had a hard time getting it right. I'll try to cut a PR by the next Monday meeting.

trgibbons commented 4 years ago

Thanks @edawson!

To follow up on @subwaystation's comment, I can provide a small but dramatic example of the massive amount of memory used by vg view:

We converted a 178 MB GFA for 12 YPRP yeast genomes into a 46 MB vg file using vg view, which peaked at 4.45 GB of resident RAM, respectively 25x and 97x larger than the input or output file. The genomes we are actually interested in are several Gbp in length and highly repetitive, which is why we're exploring all of our options.