vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

Issue with vg map and vg augment for certain inputs. #4288

Closed whelixw closed 1 month ago

whelixw commented 1 month ago

1. What were you trying to do? I am trying to map and augment one fasta to a graph made by vg construct. The graph is circularized by vg circularize.

2. What did you want to happen? I expected the mapping and augmentation to succeed and produce a vg file containing two paths. These exact commands have worked with other inputs.

3. What actually happened? vg augment throws an error.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.55.0 "Bernolda"
Stack trace (most recent call last):
#13   Object "/storage/ctools/vg_1.55.0/vg", at 0x617444, in _start
#12   Object "/storage/ctools/vg_1.55.0/vg", at 0x2077976, in __libc_start_main
#11   Object "/storage/ctools/vg_1.55.0/vg", at 0x20760d9, in __libc_start_call_main
#10   Object "/storage/ctools/vg_1.55.0/vg", at 0xdfcc1b, in vg::subcommand::Subcommand::operator()(int, char**) const
#9    Object "/storage/ctools/vg_1.55.0/vg", at 0xc670b0, in main_augment(int, char**)
#8    Object "/storage/ctools/vg_1.55.0/vg", at 0xed2822, in vg::augment(handlegraph::MutablePathMutableHandleGraph*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)
#7    Object "/storage/ctools/vg_1.55.0/vg", at 0xed1dd8, in vg::augment_impl(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Alignment&)>, bool, bool)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)
#6    Object "/storage/ctools/vg_1.55.0/vg", at 0xec9e56, in std::_Function_handler<void (std::function<void (vg::Alignment&)>, bool, bool), vg::augment(handlegraph::MutablePathMutableHandleGraph*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(std::function<void (vg::Alignment&)>, bool, bool)#2}>::_M_invoke(std::_Any_data const&, std::function<void (vg::Alignment&)>&&, bool&&, bool&&)
#5    Object "/storage/ctools/vg_1.55.0/vg", at 0xec9940, in vg::augment(handlegraph::MutablePathMutableHandleGraph*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(std::function<void (vg::Alignment&)>, bool, bool)#2}::operator()(std::function<void (vg::Alignment&)>, bool, bool) const
#4    Object "/storage/ctools/vg_1.55.0/vg", at 0x143e307, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#3    Object "/storage/ctools/vg_1.55.0/vg", at 0xecdb6d, in std::_Function_handler<void (std::istream&), vg::augment(handlegraph::MutablePathMutableHandleGraph*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(std::function<void (vg::Alignment&)>, bool, bool)#2}::operator()(std::function<void (vg::Alignment&)>, bool, bool) const::{lambda(std::istream&)#1}>::_M_invoke(std::_Any_data const&, std::istream&)
#2    Object "/storage/ctools/vg_1.55.0/vg", at 0xc63c73, in void vg::io::for_each<vg::Alignment>(std::istream&, std::function<void (long, vg::Alignment&)> const&)
#1    Object "/storage/ctools/vg_1.55.0/vg", at 0xed64fd, in vg::augment_impl(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Alignment&)>, bool, bool)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(vg::Alignment&)#3}::operator()(vg::Alignment&) const
#0    Object "/storage/ctools/vg_1.55.0/vg", at 0x5398b6, in vg::simplify_filtered_edits(handlegraph::HandleGraph*, vg::Alignment&, vg::Path&, std::map<std::tuple<long long, bool, unsigned long>, long long, std::less<std::tuple<long long, bool, unsigned long> >, std::allocator<std::pair<std::tuple<long long, bool, unsigned long> const, long long> > > const&, std::unordered_map<long long, unsigned long, std::hash<long long>, std::equal_to<long long>, std::allocator<std::pair<long long const, unsigned long> > > const&, double, double) [clone .cold]
ERROR: Signal 11 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!

5. What data and command can the vg dev team use to make the problem happen? wd.zip

The following commands should be run inside the extracted wd directory:

vg construct -r MW534270.1.fasta > MW534270.1_initial_graph.vg
vg circularize -p MW534270.1 MW534270.1_initial_graph.vg > MW534270.1_graph_circ.vg
vg stats -z MW534270.1_graph_circ.vg
vg index -x graph_circ.xg MW534270.1_graph_circ.vg
vg prune -k 48 MW534270.1_graph_circ.vg > MW534270.1_graph_circ_pruned.vg
vg index -g graph_circ.gcsa -Z 400 MW534270.1_graph_circ_pruned.vg
tail -n +2 MF925712.1.fasta | tr -d '\n' > mitogenome_str 
vg map -s $(< mitogenome_str) -V MF925712.1 -g graph_circ.gcsa -x graph_circ.xg > MF925712.1.gam
vg augment  MW534270.1_graph_circ.vg MF925712.1.gam -i -S > MF925712.1_graph_circ.vg

6. What does running vg version say?

vg version v1.55.0 "Bernolda"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by jeizenga@mustard
glennhickey commented 1 month ago

Your string is too long to map with vg map, at least with default parameters. It gives a warning

vg map -s $(< mitogenome_str) -V MF925712.1 -g graph_circ.gcsa -x graph_circ.xg > MF925712.1.gam
warning: Thread 0 encountered sequence of length 16504, which is longer than the non-chunked limit of 256. Alignments may be discontiguous. To adjust this behavior, change the band width parameter. Suppressing further warnings. 

which should be an error here, because the GAM that comes out is invalid

vg validate  MW534270.1_graph_circ.vg -a MF925712.1.gam
Invalid Alignment:
{"name": "MF925712.1", "path": {"mapping": [{"edit": [{"sequence": "GTTAATGTAGCTTAATAAT....
Node 0 not found in graph
alignment: invalid
graph: valid

If you listen to the warning and add -w 16505 to vg map, everything will run through (though I don't make any claims about the quality of the alignment -- vg map is a short read aligner).

whelixw commented 1 month ago

Changing the band width does indeed fix the issue. (note: The band width does not need to be increased to avoid this behavior, just changed. It runs fine with "-w 128") I wonder why longer sequences don't have the same behavior. I assume it is dependant on the alignment? NC_008143.1.fasta.txt As an example, this works:

vg construct -r MW534270.1.fasta > MW534270.1_initial_graph.vg
vg circularize -p MW534270.1 MW534270.1_initial_graph.vg > MW534270.1_graph_circ.vg
vg stats -z MW534270.1_graph_circ.vg
vg index -x graph_circ.xg MW534270.1_graph_circ.vg
vg prune -k 48 MW534270.1_graph_circ.vg > MW534270.1_graph_circ_pruned.vg
vg index -g graph_circ.gcsa -Z 400 MW534270.1_graph_circ_pruned.vg
tail -n +2 NC_008143.1.fasta.txt | tr -d '\n' > mitogenome_str 
vg map -s $(< mitogenome_str) -V NC_008143.1 -g graph_circ.gcsa -x graph_circ.xg > NC_008143.1.gam
vg augment  MW534270.1_graph_circ.vg NC_008143.1.gam -i -S > NC_008143.1_graph_circ.vg

For the record, I am using vg map as I want my augmented graphs to be circular. I've tested giraffe for this, but it does not produce circular graphs after augmentation.