vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

long reads augment: position not inside node #3717

Open jiadong324 opened 2 years ago

jiadong324 commented 2 years ago

Dear vg team,

I read the recent HPRC publication 'A draft human pangenome reference'. Inspired by this paper, I tried to call SVs only belongs to sample A but absent from sample B. The general steps are listed as follows:

  1. Construct pangenome graph with grch38 and assemblies of sample B created by hifiasm. I used the following command to build the graph:
## Create the graph
minigraph -cxggs -t16 referece.fa assembly.p_ctg.fa > ref.sample-B.gfa

## Convert to vg and index
vg view -F ref.sample-B.gfa -v > ref.sample-B.vg
vg index ref.sample-B.vg -x ref.sample-B.xg
  1. Mapping long read data of sample A to the graph ref.sample-B.gfa with GraphAligner.
GraphAligner -g ref.sample-B.gfa -f sample-A.reads.fa -a ref.sample-B.sample-A.gam -x vg
  1. Identifying potential novel SVs of sample A via graph augmentation.
vg augment ref.sample-B.vg ref.sample-B.sample-A.gam -A ref.sample-B.sample-A.aug.gam > ref.sample-B.sample-A.aug.vg

An error was raised at step 3, which shows forwardize_breakpoints error: failure, position 22790-208928 is not inside node 22790. The vg version is v1.41.0 "Salmour".

The stack trace file shows:

Crash report for vg v1.41.0
Stack trace (most recent call last) in thread 430195:
#15   Object "", at 0xffffffffffffffff, in 
#14   Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1f0a562, in __clone
#13   Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1459fc8, in start_thread
#12   Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1e3ef1d, in gomp_thread_start
#11   Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xc7cf8a, in void vg::io::for_each_parallel_impl<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.0]
#10   Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1e417f7, in gomp_team_barrier_wait_end
#9    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1e38ebb, in gomp_barrier_handle_tasks
#8    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xc7c871, in void vg::io::for_each_parallel_impl<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.1]
#7    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xb0e8ff, in std::_Function_handler<void (vg::Alignment&, vg::Alignment&), vg::io::for_each_parallel<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&)> const&, unsigned long)::{lambda(vg::Alignment&, vg::Alignment&)#1}>::_M_invoke(std::_Any_data const&, vg::Alignment&, vg::Alignment&)
#6    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xd94825, in vg::augment_impl(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Alignment&)>, bool, bool)>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<vg::Translation, std::allocator<vg::Translation> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool, bool, bool, double, double, vg::Packer*, unsigned long, double, bool)::{lambda(vg::Alignment&)#1}::operator()(vg::Alignment&) const
#5    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xd94401, in vg::find_packed_breakpoints(vg::Path const&, vg::Packer&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double, double)
#4    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0xd8dbb0, in vg::forwardize_breakpoints(handlegraph::HandleGraph const*, std::unordered_map<long long, std::set<std::tuple<long long, bool, unsigned long>, std::less<std::tuple<long long, bool, unsigned long> >, std::allocator<std::tuple<long long, bool, unsigned long> > >, std::hash<long long>, std::equal_to<long long>, std::allocator<std::pair<long long const, std::set<std::tuple<long long, bool, unsigned long>, std::less<std::tuple<long long, bool, unsigned long> >, std::allocator<std::tuple<long long, bool, unsigned long> > > > > > const&)
#3    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x1e62475, in __assert_fail
#2    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x5afcc7, in __assert_fail_base.cold
#1    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x5afdf7, in abort
#0    Object "/home/jdlin/anaconda3/envs/vg-env/bin/vg", at 0x145d3ab, in raise

Thanks!

glennhickey commented 2 years ago

Can you try running

vg validate ref.sample-B.vg -a ref.sample-B.sample-A.gam

to check if the GraphAligner output is valid?

JD12138 commented 1 year ago

Can you try running

vg validate ref.sample-B.vg -a ref.sample-B.sample-A.gam

to check if the GraphAligner output is valid?

Hello ,I have the same question as @jiadong324 . And I also use GraphAligner to map assemble contigs to my.vg. Then I got a .gam file. Then I use vg augment to embed my alignment to the vg graph. And I got the same error. Then I use vg validate. There are 89 invalid alignments. But I use the same vg file in GraphAligner and vg augment. So what's the reason that it doesn't match ?