vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.13k stars 194 forks source link

xg executable creates files that "can't" be loaded in vg, while vg convert can't build xg index in low memory #2716

Open ekg opened 4 years ago

ekg commented 4 years ago

I'm trying to chunk a small graph.

I wanted to use vg convert to make an xg index, but it uses a huge amount of memory.

The xg from GFA construction in the xg repo uses around the same amount of memory as the size of the GFA, so I was able to use that.

But, when loading the graph into vg convert, I get this:

-> % vg chunk -x SRR11267570.9kb.k16.xg -C                                                     
warning [libhandlegraph]: Serialized handle graph does not appear to match deserialzation type.                                                                                                
warning [libhandlegraph]: It is either an old version or in the wrong format.                                                                                                                  
warning [libhandlegraph]: Attempting to load it anyway. Future releases will reject it!                                                                                                        
warning:[XG] Loading an out-of-date XG format.For better performance over repeated loads, consider recreating this XG index.                                                                   

It did seem to work, but it was using a huge amount of RAM again, maybe to do the conversion?

How can we fix some part of this?

ekg commented 4 years ago

Size estimates. GFA is 87MB. vg convert used 2GB and locked up my system. xg -g ... -o ... used about 150MB. vg chunk used about 2GB of memory before I killed it.

jeizenga commented 4 years ago

I already mentioned this in the chat, for the sake of transparency: the new vg convert options use the XG's from_gfa method: https://github.com/vgteam/vg/blob/master/src/subcommand/convert_main.cpp#L124 It looks like our xg submodule is basically up-to-date, so I'm not sure what could be causing the discrepancy.