vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg giraffe crashed in paired-end short-read mapping #3448

Closed joyeuxnoel8 closed 2 years ago

joyeuxnoel8 commented 2 years ago

1. What were you trying to do? Map simulated error-free paired-end short reads to minigraph-cactus graph using Giraffe.

2. What did you want to happen? Finish the job without error.

3. What actually happened? I have six genomes, or twelve haplotypes. Read mapping in three haplotypes failed with the following message.

vg: src/dozeu_interface.cpp:126: vg::DozeuInterface::graph_pos_s vg::DozeuInterface::calculate_max_position(const vg::DozeuInterface::OrderedGraph&, const vg::DozeuInterface::graph_pos_s&, size_t, bool, const std::vector<const dz_forefront_s*>&): Assertion `forefronts.at(max_node_index)->mcap != nullptr' failed.
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_tEZgDW/stacktrace.txt

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.35.0-84-g873b8556d "Ghizzano"
Stack trace (most recent call last) in thread 26840:
#19   Object "", at 0xffffffffffffffff, in 
#18   Object "/vg/bin/vg", at 0x1cf73b2, in __clone
#17   Object "/vg/bin/vg", at 0x1336bc8, in start_thread
#16   Object "/vg/bin/vg", at 0x1c3049d, in gomp_thread_start
#15   Object "/vg/bin/vg", at 0x10981ff, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.0]
#14   Object "/vg/bin/vg", at 0x1c32d77, in gomp_team_barrier_wait_end
#13   Object "/vg/bin/vg", at 0x1c2a43b, in gomp_barrier_handle_tasks
#12   Object "/vg/bin/vg", at 0x10983aa, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.1]
#11   Object "/vg/bin/vg", at 0xb9a73d, in std::_Function_handler<void (vg::Alignment&, vg::Alignment&), main_giraffe(int, char**)::{lambda()#1}::operator()() const::{lambda(vg::Alignment&, vg::Alignment&)#6}>::_M_invoke(std::_Any_data const&, vg::Alignment&, vg::Alignment&)
#10   Object "/vg/bin/vg", at 0xdf0a3a, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&, std::vector<std::pair<vg::Alignment, vg::Alignment>, std::allocator<std::pair<vg::Alignment, vg::Alignment> > >&)
#9    Object "/vg/bin/vg", at 0xdee118, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)
#8    Object "/vg/bin/vg", at 0xdf30d6, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)::{lambda(unsigned long)#14}::operator()(unsigned long) const
#7    Object "/vg/bin/vg", at 0xdf1b4c, in vg::MinimizerMapper::attempt_rescue(vg::Alignment const&, vg::Alignment&, std::vector<vg::MinimizerMapper::Minimizer, std::allocator<vg::MinimizerMapper::Minimizer> > const&, bool)
#6    Object "/vg/bin/vg", at 0xe986a2, in vg::Aligner::align_xdrop(vg::Alignment&, handlegraph::HandleGraph const&, std::vector<handlegraph::handle_t, std::allocator<handlegraph::handle_t> > const&, std::vector<vg::MaximalExactMatch, std::allocator<vg::MaximalExactMatch> > const&, bool, unsigned short) const
#5    Object "/vg/bin/vg", at 0xf38bfb, in vg::DozeuInterface::align(vg::Alignment&, handlegraph::HandleGraph const&, std::vector<handlegraph::handle_t, std::allocator<handlegraph::handle_t> > const&, std::vector<vg::MaximalExactMatch, std::allocator<vg::MaximalExactMatch> > const&, bool, signed char, unsigned short)
#4    Object "/vg/bin/vg", at 0xf36b51, in vg::DozeuInterface::calculate_max_position(vg::DozeuInterface::OrderedGraph const&, vg::DozeuInterface::graph_pos_s const&, unsigned long, bool, std::vector<dz_forefront_s const*, std::allocator<dz_forefront_s const*> > const&)
#3    Object "/vg/bin/vg", at 0x1c538f5, in __assert_fail
#2    Object "/vg/bin/vg", at 0x580c23, in __assert_fail_base.cold
#1    Object "/vg/bin/vg", at 0x580d53, in abort
#0    Object "/vg/bin/vg", at 0x1339fab, in raise

5. What data and command can the vg dev team use to make the problem happen? Short reads can be found here: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=submissions/11832E7B-53C9-4628-A923-4432F2392BCB--VG-GIRAFFE-TROUBLESHOOTING/

Graphs/indices can be found here: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/scratch/2021_08_11_minigraph_cactus/filtered-graphs/

Command used:

p=GRCh38-f1g-90-mc-aug11-clip.d9.m.1000
vg giraffe -x $p.xg -g $p.gg -H $p.gbwt -m $p.min -d $p.dist -p -f HG00864.0.15x.rlen150.gap200.reads.fa.gz -i -t 16 >/dev/null
vg giraffe -x $p.xg -g $p.gg -H $p.gbwt -m $p.min -d $p.dist -p -f HG01114.0.15x.rlen150.gap200.reads.fa.gz -i -t 16 >/dev/null
vg giraffe -x $p.xg -g $p.gg -H $p.gbwt -m $p.min -d $p.dist -p -f HG02587.1.15x.rlen150.gap200.reads.fa.gz -i -t 16 >/dev/null

6. What does running vg version say?

vg version v1.35.0-84-g873b8556d "Ghizzano"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by root@buildkitsandbox
adamnovak commented 2 years ago

@jeizenga You're the person on the vg team who knows the most about the insides of Dozeu. Do you want to work on this Dozeu assertion failure?

jeizenga commented 2 years ago

I've reproduced this locally. Hopefully I'll have a solution within a few days.

jeizenga commented 2 years ago

@joyeuxnoel8 The problem results fromo the fact that the sequences you are mapping have some lowercase nucleotides. We will figure out a way to be more robust to this input, but in the meantime, you could hot-fix this experiment by converting the simulated reads to uppercase.

joyeuxnoel8 commented 2 years ago

Thanks for spotting this @jeizenga. I did not expect to have lowercase nucleotides. I can close the issue now since it can be fixed from the user end. Or you can close it if you want to wait until the program is adapted to lower case inputs. Let me know.