Open KopalliV opened 1 month ago
Hmm, that doesn't look like it ought to happen, nor like it is happening at a stage where PGGB graphs are known to be difficult. Do you have a .vg
graph produced? Or do you get one in a manually specified temp directory if you vg autoindex --tmp-dir ./wherever ...
? What does vg stats --format whatever.vg
say? Or xxd whatever.vg | head -n10
?
I was just going to post an update that it ran further when a --tmp-dir is specified but still does not generate all indexes. It gives a GBZ file but gets killed while generating the distance index.
vg autoindex -w giraffe --prefix pggb_real --gfa pggb.gfa -T ../../temp/ [IndexRegistry]: Checking for haplotype lines in GFA. [IndexRegistry]: Constructing VG graph from GFA input. [IndexRegistry]: Constructing XG graph from VG graph. [IndexRegistry]: Constructing a greedy path cover GBWT [IndexRegistry]: Constructing GBZ using NamedNodeBackTranslation. [IndexRegistry]: Constructing distance index for Giraffe. Killed
That sounds like maybe it had run out of disk space the first time, and now it's running out of memory. You can try giving it more memory; 1 or 2 terabytes might be able to do it, if it can be done at all.
This is a known issue with PGGB graphs: they have very large individual "snarls" without a lot of internal structure, so the distance index needs to hold some quadradically large all-to-all distance matrices. We have a parameter (maybe only in the manual indexing pipeline?) that lets you control how big the biggest matrix we store is, but sometimes you can set that low enough to build the index and then get terrible runtime performance because when the distances aren't in the index it has to do runtime traversals of the graph to try and find them.
The other solution is to aggressively prune the PGGB graphs with vg prune
until enough complex regions have been flattened out that it can be indexed. But then you're not really working with the graph you want to work with.
We need some kind of new distance indexing technology to get good performance on PGGB graphs that we have not yet invented, unfortunately.
I am trying to generate indexes for giraffe of a GFA file generated using PGGB but I am getting the following error continuously.
Command used : vg autoindex -w giraffe --prefix pggb_real --gfa pggb.gfa
Error: [vg autoindex] Executing command: vg autoindex -w giraffe --prefix pggb_real --gfa pggb.gfa [IndexRegistry]: Checking for haplotype lines in GFA. [IndexRegistry]: Constructing VG graph from GFA input. [IndexRegistry]: Constructing XG graph from VG graph. error[VPKG::load_one]: Correct input type not found while loading handlegraph::PathHandleGraph
Can somebody guide me if I am doing something wrong here.?
Vg version: v1.57.0-21-gdb574a520 "Franchini"