vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg surject and pack fail on GraphAligner produced .gam #4355

Open peterdfields opened 1 month ago

peterdfields commented 1 month ago

1. What were you trying to do?

I would like to use VG to work with long read data aligned to a pangenome graph generated using cactus-minigraph. I've been following the steps described here, including using vg convert and GraphAligner to create the .gfa index and align the ONT data, respectively. Now I am trying to apply VG to surject a gam -> bam as well as do SV variant calling using VG on the gam.

2. What did you want to happen?

To create a .bam from the .gam and run the vg pack step (including the use of -Q0) in preparation for SV calling.

3. What actually happened?

In both cases I see an error or segfault.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

In the case of vg surject I see:

error:[vg surject] Alignment 10ca7a43-076d-4e06-8811-0f83ee6a49fc cannot be interpreted against this graph: Length of node 92455955 (1) exceeded by Mapping with offset 897 and from-length 32
Make sure that you are using the same graph that the reads were mapped to!
error:[vg surject] Alignment d996bf34-0453-4ba4-a6c5-ccfb3556bb0e cannot be interpreted against this graph: Length of node 103423848 (1) exceeded by Mapping with offset 113 and from-length 65
Make sure that you are using the same graph that the reads were mapped to!

In the case of vg pack I see

vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
vg: src/packer.cpp:883: size_t vg::Packer::average_node_quality(size_t) const: Assertion `total_node_quality(i) == 0' failed.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━Crash report for vg ━━━v1.58.0 "Cartari"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Crash report for vg ━━━v1.58.0 "Cartari"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg ━Crash report for vg ━━v1.58.0 "Cartari"
━━━━━━━━━━━━━━━━━━━━━
Crash report for vg ━━v1.58.0 "Cartari"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━v1.58.0 "Cartari"
━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg 
Crash report for vg ━v1.58.0 "Cartari"━━━━━━━━━━━━━━━━━━━━━━━━━━
Crash report for vg ━━━━━━Crash report for vg ━━━━━━━v1.58.0 "Cartari"
━━
Crash report for vg ━━━━
Crash report for vg ━━━━━━━━━━━━━━━━━
━
━v1.58.0 "Cartari"
━━━━━━━━━
Crash report for vg v1.58.0 "Cartari"
━━━━━━━━
━━━━━v1.58.0 "Cartari"
v1.58.0 "Cartari"━━━━━━━Crash report for vg ━━━━━━━━━━━━━━━
━━━━━━Crash report for vg v1.58.0 "Cartari"
━v1.58.0 "Cartari"
━━━━━━━━━━━━v1.58.0 "Cartari"━━━━━━━━━━━━━━━━━━━━━━━━━━━━

━━━━━━
━━
━━━━━━━━━━━━━━━━
Crash report for vg ━━v1.58.0 "Cartari"
Crash report for vg ━━━
Crash report for vg ━v1.58.0 "Cartari"━━━Crash report for vg ━━━━━━━━━━
Crash report for vg ━━
Crash report for vg v1.58.0 "Cartari"
━━━━━━v1.58.0 "Cartari"
v1.58.0 "Cartari"━━━━━━━
Crash report for vg 
Crash report for vg v1.58.0 "Cartari"v1.58.0 "Cartari"
━
━━

━
v1.58.0 "Cartari"
━━━━
━━
━━━━━━━

Crash report for vg Crash report for vg v1.58.0 "Cartari"Crash report for vg 
Crash report for vg ━━━━Crash report for vg 
v1.58.0 "Cartari"
━━
Crash report for vg v1.58.0 "Cartari"v1.58.0 "Cartari"v1.58.0 "Cartari"
Crash report for vg Crash report for vg v1.58.0 "Cartari"
━
v1.58.0 "Cartari"

v1.58.0 "Cartari"
Crash report for vg 
Crash report for vg v1.58.0 "Cartari"
v1.58.0 "Cartari"
Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571:
 in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
 in thread 3128571:
Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571Stack trace (most recent call last) in thread 3128571Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571Stack trace (most recent call last)Stack trace (most recent call last) in thread 3128571:
:
 in thread Stack trace (most recent call last) in thread Stack trace (most recent call last) in thread 3128571:
:
Stack trace (most recent call last) in thread 3128571:
 in thread 3128571:
Stack trace (most recent call last)Stack trace (most recent call last) in thread Stack trace (most recent call last) in thread 3128571Stack trace (most recent call last) in thread 3128571:
Stack trace (most recent call last) in thread Stack trace (most recent call last) in thread 3128571:
 in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
:
:
3128571:
3128571:
Stack trace (most recent call last) in thread 3128571:
 in thread 3128571:
Stack trace (most recent call last) in thread 3128571:
━━━━:
3128571:
 in thread 31285713128571:
:
━━━━━━━━━━━━━━━━
━━━━Crash report for vg ━━v1.58.0 "Cartari"━━━━
━━━━━━━━━━━━━━━━━━━━━━Stack trace (most recent call last)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ in thread 3128565━━━####━━#━━:
━━━##10#━#10#━━##1010   Object "━━#━━━━━━━━━━━━━━━10   Object "", at 0xffffffffffffffff, in 
10   Object "━━━10#10━━━10━━━━#16━━━━━━━━━   Object "━10   Object "", at 0xffffffffffffffff, in 
━━━━━━━", at 0xffffffffffffffff━   Object "10━
1010   Object "━━   Object "━   Object "━━━━━━━, in ━
━10   Object "", at 0xffffffffffffffff", at ━0xffffffffffffffff━━━━━", at 0xffffffffffffffff━━━━━", at    Object "   Object "", at ━   Object "#10━━   Object "", at ━━━Crash report for vg , in v1.58.0 "Cartari"━━━0xffffffffffffffff", at ━━, in ━━", at 0xffffffffffffffff━━━   Object "━━━━━━━━━0xffffffffffffffff, in    Object "━━0xffffffffffffffff━━━━━", at ━━, in ━━━━━━━━━, in ━━━━
", at 0xffffffffffffffff, in 
", at ━━━0xffffffffffffffff, in 
", at ━", at 0xffffffffffffffff, in 
━━━━, in 
━━━━━━

5. What data and command can the vg dev team use to make the problem happen?

Depending upon what the possible error might be I can figure what data can be shared related to the graph and reads.

6. What does running vg version say?

v1.58.0 "Cartari"
glennhickey commented 1 month ago

This seems like an incompatibility between your gam and graph. You can verify using

vg validate <graph> -a <gam> 

Possible fixes are to use the .gfa file that you used for GraphAligner for surject/pack. Or to add --vg-algorithm to the vg convert command used to make that gfa in the first place.

peterdfields commented 1 month ago

Running vg validate on the graph and gam file does identify many Invalid Alignment: instances (I can send specific instances if that's helpful). On the tutorial linked to above the suggestion for preparing the graph for GraphAligner is:

bash -c "vg convert ./hprc10/hprc10.gbz -f > ./hprc10/hprc10.gbz.gfa"

Should the correct command then to create a vg surject/pack .gam be:

bash -c "vg convert --vg-algorithm ./hprc10/hprc10.gbz -f > ./hprc10/hprc10.gbz.gfa"

When I tried using the .gfa produced with the first command above and used for the GraphAligner alignment step it did seem to be running but after waiting a few hours it hadn't produced any output. This is a low-coverage test dataset so I wouldn't have expected such a long run time but perhaps I killed the job too soon?

glennhickey commented 1 month ago

Yeah, I think that's a bug in the tutorial. I've fixed it here (which will be merged at some point into the master branch) to add the --vg-algorithm. The auto node name conversion, though well intentioned, is extremely confusing.

Anyway, running GraphAligner on the graph converted with that flag should produce a GAM that passes the vg validate -a check with the input gbz (please let me know if it doesn't!). On the other hand, using / not using --vg-algorithm should have very little bearing on the GraphAligner performance, mapping rate etc.

peterdfields commented 1 month ago

Following you're suggestions I am able to successfully run the vg validation, vg surject, and vg pack! I do still see some warnings from GraphAligner that I need to investigate further on individual reads, such as the following:

src/GraphAlignerBitvectorCommon.h:560: Assertion 'traceEnd >= traceStart' failed. Read: fed8cb24-e303-4460-a55e-f6b1c1f653f9. Seed: 0+,0,0,0
src/GraphAlignerBitvectorCommon.h:560: Assertion 'traceEnd >= traceStart' failed. Read: 7944e6cc-ccdd-4062-8604-20675227310a. Seed: 0+,0,0,0
src/GraphAlignerBitvectorCommon.h:560: Assertion 'traceEnd >= traceStart' failed. Read: 400c9730-d168-41c8-8664-f9a060591c85. Seed: 0+,0,0,0
src/GraphAligner.h:685: Assertion 'trace.trace[i].nodeSwitch || trace.trace[i].DPposition.node != trace.trace[i+1].DPposition.node || trace.trace[i].DPposition.nodeOffset != trace.trace[i+1].DPposition.nodeOffset' failed. Read: 2a074333-1c4f-49a7-8471-145923a166b2. Seed: 0+,0,0,0