vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.08k stars 191 forks source link

vg giraffe mapping slow when vg autoindex return a waring `distance index uses oversized snarls, which may make mapping slow` #4321

Closed Wenhai-Zhang closed 1 day ago

Wenhai-Zhang commented 2 days ago

1. What were you trying to do? I used 200 E. coli complete genomes to build a pangeome graph, which was built with pggb. I ran vg autoindex to build index but it return a warning.

warning: distance index uses oversized snarls, which may make mapping slow
    try increasing --snarl-limit when building the distance index

Now I use vg giraffe to align reads(33G FASTQ) to the pangenome graph.

2. What did you want to happen? Get the GAF file.

3. What actually happened? vg giraffe mapping very slow. I want to know what this warning means. Is there a way to solve this problem?

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

6. What does running vg version say?

vg version v1.52.0 "Bozen"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by jeizenga@emerald
adamnovak commented 1 day ago

I think this is another manifestation of this problem: PGGB graphs with very large single variable sites ("snarls") in them can't be efficiently handled by the indexes Giraffe uses. Some new kind of distance measurement approach needs to be developed that can address graphs that are as tangled up on themselves as PGGB graphs can be.

I think the workaround, other than aggressive pruning with vg prune, is to change the settings on PGGB to stop the graph from collapsing as much, but I'm not sure how to do that exactly. Any setting you can find to require longer or more exact matches before merging two input bases would be useful.

Wenhai-Zhang commented 1 day ago

I got it. Thanks