vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.11k stars 194 forks source link

Generating an XG index from a GFA using `vg convert` leads to strange printouts #3515

Closed subwaystation closed 2 years ago

subwaystation commented 2 years ago

1. What were you trying to do?

Generate an XG index from a GFA file.

2. What did you want to happen?

Finish it without errors or strange printouts.

3. What actually happened?

There were some strange printouts, looks like a binary blob.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

No stacktrace was printed.

5. What data and command can the vg dev team use to make the problem happen?

/usr/bin/time --verbose vg convert -g -x -t 16 chr6.pan.fa.a2fb268.4030258.6a1ecc2.smooth.gfa > chr6.pan.fa.a2fb268.4030258.6a1ecc2.smooth.gfa.xg\u0R _\u0R!_0  \u0R"_0
\u0R#_0
       \u0R$_0
\u0R&_0\u0R'_0\u0R(_0\u0R)_0\u0R*_0\u0R+_0\u0R,_0\u0R-_0\u0R._0\u0R/_0\u0R0_0\u0R1_0\u0R2_0�\u0R3_0

Graph: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/scratch/2021_11_16_pggb_wgg.88/chroms/chr6.pan.fa.a2fb268.4030258.6a1ecc2.smooth.gfa.gz

6. What does running vg version say?

vg version v1.37.0 "Monchio"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon
subwaystation commented 2 years ago

Output of vg validate:

vg validate chr6.pan.fa.a2fb268.4030258.6a1ecc2.smooth.gfa.xg
graph: valid
subwaystation commented 2 years ago

The above was on a VM.

Running it locally on my laptop with 1 thread I get:

vg convert -g -x -t 1 chr6.pan.fa.a2fb268.4030258.d9f1245.smooth.gfa > chr6.pan.fa.a2fb268.4030258.d9f1245.smooth.gfa.xg
vg: /home/anovak/workspace/vg/include/sdsl/int_vector.hpp:1436: sdsl::int_vector<<anonymous> >::const_reference sdsl::int_vector<<anonymous> >::operator[](const size_type&) const [with unsigned char t_width = 0; sdsl::int_vector<<anonymous> >::const_reference = long unsigned int; sdsl::int_vector<<anonymous> >::size_type = long unsigned int]: Assertion `idx < this->size()' failed.
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_IkAK6i/stacktrace.txt
Please include the stack trace file in your bug report!

The strack trace file is empty, so I didn't upload it.

Thanks for any help!

Best, Simon

jltsiren commented 2 years ago

This could be a hardware issue, or it could be caused by file corruption. I was able to convert the GFA to XG with both release 1.37.0 and the latest master.

subwaystation commented 2 years ago

Hmm. On another VM it works. I doubt it is a file corruption, I downloaded it several times with the same result.

Would you think it's a RAM or rather a hard disk issue?

jltsiren commented 2 years ago

It could be a RAM issue. Disk issues are unlikely, as they should trigger a checksum. The VG binary itself could be corrupted. The garbage output could be caused by something related to how the VM handles pipes. The crash during the local run could be caused by running out of space in the temporary directory.

subwaystation commented 2 years ago

I didn't run into this again so far. Thanks for all the tips @jltsiren.