vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.07k stars 191 forks source link

vg augment forwardize_breakpoints error #4305

Open genesok opened 3 weeks ago

genesok commented 3 weeks ago

1. What were you trying to do? I wanted to perform augmentation using gam file mapped through vg giraffe on my pangenome graph.

2. What did you want to happen? Successfully augmented graph

3. What actually happened? forwardize_breakpoints error occurred, resulting in Error: Signal 6

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

forwardize_breakpoints error: failure, position 404963-29 is not inside node 404963
vg: src/augment.cpp:514: std::unordered_map<long long int, std::set<std::tuple<long long int, bool, long unsigned int> > > vg::forwardize_breakpoints(const HandleGraph*, const std::unordered_map<long long int, std::set<std::tuple<long long int, bool, long unsigned int> > >&): Assertion `false' failed.
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.49.0 "Peschici"
Stack trace (most recent call last) in thread 1825:
#14   Object "", at 0xffffffffffffffff, in
#13   Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x213f33f, in __clone3
#12   Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x20989ca, in start_thread
#11   Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x203b32d, in gomp_thread_start
#10   Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x203dc77, in gomp_team_barrier_wait_end
#9    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x203557a, in gomp_barrier_handle_tasks
#8    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0xce5815, in void vg::io::for_each_parallel_impl<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.1]
#7    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0xec68bb, in vg::augment_impl(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Alignment&)>, bool, bool)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(vg::Alignment&)#1}::operator()(vg::Alignment&) const
#6    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0xec6589, in vg::find_packed_breakpoints(vg::Path const&, vg::Packer&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double, double)
#5    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0xebf8d6, in vg::forwardize_breakpoints(handlegraph::HandleGraph const*, std::unordered_map<long long, std::set<std::tuple<long long, bool, unsigned long>, std::less<std::tuple<long long, bool, unsigned long> >, std::allocator<std::tuple<long long, bool, unsigned long> > >, std::hash<long long>, std::equal_to<long long>, std::allocator<std::pair<long long const, std::set<std::tuple<long long, bool, unsigned long>, std::less<std::tuple<long long, bool, unsigned long> >, std::allocator<std::tuple<long long, bool, unsigned long> > > > > > const&)
#4    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x20671d5, in __assert_fail
#3    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x5e42f3, in __assert_fail_base.cold
#2    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x5e43cb, in abort
#1    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x206d7e5, in raise
#0    Object "/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg", at 0x209a1ec, in __pthread_kill
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━

5. What data and command can the vg dev team use to make the problem happen? The minipigs.assembly_only.gfa file is a pangenome graph file created using minigraph that serves as the target for augmentation.

#indexing
/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg autoindex -p minipigs.assembly_only --workflow giraffe -g minipigs.assembly_only.gfa -T autoindex_tmp --threads 15 &> autoindex.log
#mapping
/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg map -x /mss_ds/project/pangenome/results/minigraph/minipigs.assembly_only_map.xg -g /mss_ds/project/pangenome/results/minigraph/minipigs.assembly_only_map.gcsa -f /mss_ds/project/pangenome/data/trim/diannan_smallear/SRR12009402_1.fastq.gz_filtered.gz -f /mss_ds/project/pangenome/data/trim/diannan_smallear/SRR12009402_2.fastq.gz_filtered.gz --threads 10  > /mss_ds/project/pangenome/results/minigraph_update/vg/map_gam/diannan_smallear.SRR12009402.gam 2> /mss_ds/project/pangenome/results/minigraph_update/vg/map_gam/diannan_smallear.SRR12009402.log
#convert gfa to vg
/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg convert -g -p -t 15 minipigs.assembly_only.gfa > minipugs.assembly_only.packed.vg 2> vg_convert.log
#augment
/mss_dc/project/minipig_assembly/programs/vg-v1.49.0/vg augment -s -t 10 ../../minigraph/minipigs.assembly_only.packed.vg map_gam/diannan_smallear.SRR12009402.gam > dianna.augmented.vg 2> dianna.augment.log

6. What does running vg version say?

vg version v1.49.0 "Peschici"
Compiled with g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 on Linux
Linked against libstd++ 20220421
Built by anovak@octagon
adamnovak commented 3 weeks ago

I think your problem is that autoindex imports the GFA, and then you map to that, and then you re-import the GFA again to make a different graph, and try to interpret the mappings against the second graph. The node IDs are not the same between the two graphs; vg has several ways of importing GFAs and they don't always name the nodes in the result the same way.

You should bge able to use the same .xg file you used for mapping as the input to vg augment. If it doesn't like that, you can use vg convert to convert the .xg to .vg (which won't change the node IDs) and then use that file with vg augment.

You can check if vg thinks a file of reads is interpretable against a particular graph with vg validate [graph] -a [gam].

The problem should also go away if you make sure vg doesn't need to do any node ID modifications to your GFA graph: give all the S lines numerical names starting at 1, and make sure that the edges don't specify any overlaps, and that none of the nodes are too long. 32 bp is definitely a safe limit, and I think 1024 bp might be the real upper limit.