Open ekg opened 4 years ago
We should also fix the GFA import eating tons of RAM. I looked at this a few weeks back but got swallowed up by another project. It should probably be a higher priority.
@trgibbons and I observed the same issue with GFA allocating a surprisingly huge amount of memory.
It is a relatively easy fix but I had a hard time getting it right. I'll try to cut a PR by the next Monday meeting.
Thanks @edawson!
To follow up on @subwaystation's comment, I can provide a small but dramatic example of the massive amount of memory used by vg view
:
We converted a 178 MB GFA for 12 YPRP yeast genomes into a 46 MB vg file using vg view
, which peaked at 4.45 GB of resident RAM, respectively 25x and 97x larger than the input or output file. The genomes we are actually interested in are several Gbp in length and highly repetitive, which is why we're exploring all of our options.
I'd like to make an xg directly from a GFA, without going through .vg format.
This can be done with the xg executable. However, the resulting file is not packaged with the vg container system. This makes vg unhappy, and it also makes me sad because importing the GFA to vg format uses huge amounts of memory.
Taking GFA input in
vg index
should now be pretty easy to do. This is just a reminder that we should do it.