vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

autoindex --workflow giraffe throws std::out_of_range error #3883

Closed thomas-buechler-ulm closed 1 year ago

thomas-buechler-ulm commented 1 year ago

Hello,

I tried to index a human pangenome for giraffe with 'autoindex' and the program crashed.

The command:

~/vg_akt/vg autoindex --workflow giraffe -r  hs37d5.fa -v chr1.vcf.gz -v chr2.vcf.gz -v chr3.vcf.gz -v chr4.vcf.gz -v chr5.vcf.gz -v chr6.vcf.gz -v chr7.vcf.gz -v chr8.vcf.gz -v chr10.vcf.gz -v chr11.vcf.gz -v chr12.vcf.gz -v chr13.vcf.gz -v chr14.vcf.gz -v chr15.vcf.gz -v chr16.vcf.gz -v chr17.vcf.gz -v chr18.vcf.gz -v chr19.vcf.gz -v chr20.vcf.gz -v chr21.vcf.gz -v chr22.vcf.gz -v chrX.vcf.gz -v chrY.vcf.gz -p human

The input I used: FASTA: ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz The 24 VCF files from: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

The program output:

[IndexRegistry]: Checking for phasing in VCF(s).
[IndexRegistry]: Chunking inputs for parallelism.
[IndexRegistry]: Chunking FASTA(s).
[IndexRegistry]: Chunking VCF(s).
[IndexRegistry]: Constructing VG graph from FASTA and VCF input.
warning:[vg::Constructor] Skipping duplicate variant with hash 7c41b008d29662e4684e6a50d48166d129389c3f at 17:1144632
warning:[vg::Constructor] Skipping duplicate variant with hash 9628041c1009eb3fdcf82cc912cb65ae7c3d6420 at 17:1144632
warning:[vg::Constructor] Skipping duplicate variant with hash bd8c7ef1f6f167b432bd75f0c94cf223b725cb9f at 14:21649957
warning:[vg::Constructor] Skipping duplicate variant with hash 1c5597b7bccdcdc38f30c0f643eb17d3768c0926 at 14:21649957
warning:[vg::Constructor] Skipping duplicate variant with hash 0db860eda6b6b6b5e1147774ad0c8d709945fce6 at X:5375295
warning:[vg::Constructor] Skipping duplicate variant with hash ee5d8472f5ef5c1ecf62bf4885610e9fc8c95787 at X:5375295
warning:[vg::Constructor] Skipping duplicate variant with hash 9fd9c47b17ebd27d4ddf065afd38efb02d6e9607 at 12:8400000
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_mAoMqe/stacktrace.txt
Please include the stack trace file in your bug report!

The stacktrace:

Crash report for vg v1.46.0 "Altamura"
Stack trace (most recent call last) in thread 177052:
#18   Object "", at 0xffffffffffffffff, in 
#17   Object "/home/buechler/vg_akt/vg", at 0x20c4632, in __clone
#16   Object "/home/buechler/vg_akt/vg", at 0x15b0f68, in start_thread
#15   Object "/home/buechler/vg_akt/vg", at 0x1fc92f3, in execute_native_thread_routine
#14   Object "/home/buechler/vg_akt/vg", at 0x13de0ea, in std::thread::_State_impl<std::thread::_Invoker<std::tuple<vg::JobSchedule::execute(long)::{lambda()#1}> > >::_M_run()
#13   Object "/home/buechler/vg_akt/vg", at 0x1307950, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, bool, bool)#13}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, bool, bool) const::{lambda(long)#1}::operator()(long) const
#12   Object "/home/buechler/vg_akt/vg", at 0xec2c51, in vg::Constructor::construct_graph(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, handlegraph::MutablePathMutableHandleGraph*)
#11   Object "/home/buechler/vg_akt/vg", at 0x14f1e2f, in vg::io::load_proto_to_graph(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Graph&)> const&)> const&)
#10   Object "/home/buechler/vg_akt/vg", at 0xed0ea2, in vg::Constructor::construct_graph(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::function<void (vg::Graph&)> const&)
#9    Object "/home/buechler/vg_akt/vg", at 0xed0138, in vg::Constructor::construct_graph(std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::vector<vcflib::VariantCallFile*, std::allocator<vcflib::VariantCallFile*> > const&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
#8    Object "/home/buechler/vg_akt/vg", at 0xeced9a, in vg::Constructor::construct_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, FastaReference&, vg::VcfBuffer&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
#7    Object "/home/buechler/vg_akt/vg", at 0xecb607, in vg::Constructor::construct_chunk(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<vcflib::Variant, std::allocator<vcflib::Variant> >, unsigned long) const
#6    Object "/home/buechler/vg_akt/vg", at 0x5e3a64, in std::__throw_out_of_range(char const*)
#5    Object "/home/buechler/vg_akt/vg", at 0x1f47b58, in __cxa_throw
#4    Object "/home/buechler/vg_akt/vg", at 0x1f479f6, in std::terminate()
#3    Object "/home/buechler/vg_akt/vg", at 0x1f4798b, in __cxxabiv1::__terminate(void (*)())
#2    Object "/home/buechler/vg_akt/vg", at 0x5e1d06, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#1    Object "/home/buechler/vg_akt/vg", at 0x5e455f, in abort
#0    Object "/home/buechler/vg_akt/vg", at 0x15b47eb, in raise

Version:

vg version v1.46.0 "Altamura"
Compiled with g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 on Linux
Linked against libstd++ 20210408
Built by xian@octo

I also tried this version and the same error occured

vg version v1.37.0-11-g2f6837d33 "Monchio"
Compiled with g++ (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0 on Linux
Linked against libstd++ 20210427
Built by buechler@luna

I tried 'autoindex' with the GFA graph constructed by vg as input and this worked. Does this second workflow generate the same index, or are there any drawbacks i am not aware of?

~/vg_akt/vg construct -r hs37d5.fa -v chr1.vcf.gz -v chr2.vcf.gz -v chr3.vcf.gz -v chr4.vcf.gz -v chr5.vcf.gz -v chr6.vcf.gz -v chr7.vcf.gz -v chr8.vcf.gz -v chr10.vcf.gz -v chr11.vcf.gz -v chr12.vcf.gz -v chr13.vcf.gz -v chr14.vcf.gz -v chr15.vcf.gz -v chr16.vcf.gz -v chr17.vcf.gz -v chr18.vcf.gz -v chr19.vcf.gz -v chr20.vcf.gz -v chr21.vcf.gz -v chr22.vcf.gz -v chrX.vcf.gz -v chrY.vcf.gz > human.vg
~/vg_akt/vg view human.vg > human.gfa
~/vg_akt/vg index -x human.xg human.vg
~/vg_akt/vg autoindex --workflow giraffe -g human.gfa -p human

Thanks for your help!

jeizenga commented 1 year ago

I've reproduced this error, and I'm now working on getting this fixed. Thanks for reporting the bug!

jeizenga commented 1 year ago

This should run without crashing now if you rebuild off of the current master branch.

thomas-buechler-ulm commented 1 year ago

Hi,

thank you for your effort in fixing this bug.

I tried to use the new version, but i still get an error.

I think before the fix, the error occured already at chromosome 1, now the error occures at chromosome 2. (I could not test all chromosomes individually.)

Below i provide the version, commands to reproduce the error and the report.

Version

vg: variation graph tool, version v1.47.0-4-g915d3bc2b "Ostuni"

Commands to reproduce the error

wget ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz -O hs37d5.fa.gz
gzip -dc hs37d5.fa.gz >  hs37d5.fa
samtools faidx hs37d5.fa 2 > chr2.fa
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz -O chr2.vcf.gz
tabix -p vcf chr2.vcf.gz
~/vg/bin/vg autoindex --workflow giraffe -r chr2.fa   -v chr2.vcf.gz -p chr2

Report

~/vg/bin/vg autoindex --workflow giraffe -r chr2.fa   -v chr2.vcf.gz -p chr2
[vg autoindex] Executing command: /home/buechler/vg/bin/vg autoindex --workflow giraffe -r chr2.fa -v chr2.vcf.gz -p chr2
[IndexRegistry]: Checking for phasing in VCF(s).
[IndexRegistry]: Chunking inputs for parallelism.
index file chr2.fa.fai not found, generating...
[IndexRegistry]: Constructing VG graph from FASTA and VCF input.
warning:[vg::Constructor] vcflib could not canonicalize some SVs to base-level sequence; skipping variants like: 2  23163   .   C   <CN2>   100 PASS    AC=3;AF=0.000599042;AFR_AF=0.0008;AMR_AF=0;AN=5008;CS=DUP_gs;DP=19012;EAS_AF=0;END=99614;EUR_AF=0;NS=2504;SAS_AF=0.002;SVTYPE=DUP;VT=SV;EX_TARGET
warning:[vg::Constructor] Multiallelic SVs cannot be canonicalized by vcflib; skipping variants like: 2 117322  .   C   <CN0>,<CN2> 100 PASS    AC=2,3;AF=0.000399361,0.000599042;AFR_AF=0.0008,0.0008;AMR_AF=0,0;AN=5008;CS=DUP_gs;DP=18028;EAS_AF=0,0;END=178829;EUR_AF=0,0.001;NS=2504;SAS_AF=0.001,0.001;SVTYPE=CNV;VT=SV
vg: src/io/load_proto_to_graph.cpp:254: vg::io::load_proto_to_graph(vg::MutablePathMutableHandleGraph*, const std::function<void(const std::function<void(vg::Graph&)>&)>&)::<lambda(vg::Graph&)>: Assertion `get_handle(m.position().node_id(), m.position().is_reverse(), visited)' failed.
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.47.0-4-g915d3bc2b "Ostuni"
Stack trace (most recent call last) in thread 848395:
#15   Object "", at 0xffffffffffffffff, in 
#14   Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7fc92068f132, in __clone
      Source "../sysdeps/unix/sysv/linux/x86_64/clone.S", line 95, in __clone [0x7fc92068f132]
#13   Object "/usr/lib/x86_64-linux-gnu/libpthread-2.31.so", at 0x7fc920d6b608, in start_thread
      Source "/build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c", line 477, in start_thread [0x7fc920d6b608]
#12   Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.29", at 0x7fc92089e6b3, in 
#11   Object "/home/buechler/vg/bin/vg", at 0x55dd207414d7, in std::thread::_State_impl<std::thread::_Invoker<std::tuple<vg::JobSchedule::execute(long)::{lambda()#1}> > >::_M_run()
    | Source "/usr/include/c++/11/bits/std_thread.h", line 211, in operator()
    |   210:    void
    | > 211:    _M_run() { _M_func(); }
    |   212:       };
    | Source "/usr/include/c++/11/bits/std_thread.h", line 260, in _M_invoke<0>
    |   258:      using _Indices
    |   259:        = typename _Build_index_tuple<tuple_size<_Tuple>::value>::__type;
    | > 260:      return _M_invoke(_Indices());
    |   261:    }
    |   262:       };
    | Source "/usr/include/c++/11/bits/std_thread.h", line 253, in __invoke<vg::JobSchedule::execute(int64_t)::<lambda()> >
    |   251:      typename __result<_Tuple>::type
    |   252:      _M_invoke(_Index_tuple<_Ind...>)
    | > 253:      { return std::__invoke(std::get<_Ind>(std::move(_M_t))...); }
    |   254: 
    |   255:    typename __result<_Tuple>::type
    | Source "/usr/include/c++/11/bits/invoke.h", line 96, in __invoke_impl<void, vg::JobSchedule::execute(int64_t)::<lambda()> >
    |    94:       using __type = typename __result::type;
    |    95:       using __tag = typename __result::__invoke_type;
    | >  96:       return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
    |    97:                    std::forward<_Args>(__args)...);
    |    98:     }
    | Source "/usr/include/c++/11/bits/invoke.h", line 61, in operator()
    |    59:     constexpr _Res
    |    60:     __invoke_impl(__invoke_other, _Fn&& __f, _Args&&... __args)
    | >  61:     { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
    |    62: 
    |    63:   template<typename _Res, typename _MemFun, typename _Tp, typename... _Args>
    | Source "src/job_schedule.cpp", line 77, in operator()
      Source "/usr/include/c++/11/bits/std_function.h", line 560, in _M_run [0x55dd207414d7]
        557:       {
        558:    if (_M_empty())
        559:      __throw_bad_function_call();
      > 560:    return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        561:       }
        562: 
        563: #if __cpp_rtti
#10   Object "/home/buechler/vg/bin/vg", at 0x55dd206b1b78, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, bool, bool)#14}::operator()(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, bool, bool) const::{lambda(long)#1}::operator()(long) const
      Source "src/index_registry.cpp", line 2000, in operator() [0x55dd206b1b78]
#9    Object "/home/buechler/vg/bin/vg", at 0x55dd2029fe0e, in vg::Constructor::construct_graph(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, handlegraph::MutablePathMutableHandleGraph*)
      Source "src/constructor.cpp", line 2565, in construct_graph [0x55dd2029fe0e]
#8    Object "/home/buechler/vg/bin/vg", at 0x55dd208bd80b, in vg::io::load_proto_to_graph(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Graph&)> const&)> const&)
    | Source "src/io/load_proto_to_graph.cpp", line 71, in operator()
      Source "/usr/include/c++/11/bits/std_function.h", line 560, in load_proto_to_graph [0x55dd208bd80b]
        557:       {
        558:    if (_M_empty())
        559:      __throw_bad_function_call();
      > 560:    return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
        561:       }
        562: 
        563: #if __cpp_rtti
#7    Object "/home/buechler/vg/bin/vg", at 0x55dd202aef36, in vg::Constructor::construct_graph(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::function<void (vg::Graph&)> const&)
      Source "src/constructor.cpp", line 2545, in construct_graph [0x55dd202aef36]
#6    Object "/home/buechler/vg/bin/vg", at 0x55dd202ae104, in vg::Constructor::construct_graph(std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::vector<vcflib::VariantCallFile*, std::allocator<vcflib::VariantCallFile*> > const&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
      Source "src/constructor.cpp", line 2444, in construct_graph [0x55dd202ae104]
#5    Object "/home/buechler/vg/bin/vg", at 0x55dd202accc3, in vg::Constructor::construct_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, FastaReference&, vg::VcfBuffer&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
      Source "src/constructor.cpp", line 2237, in construct_graph [0x55dd202accc3]
#4    Object "/home/buechler/vg/bin/vg", at 0x55dd208c29bf, in vg::io::load_proto_to_graph(handlegraph::MutablePathMutableHandleGraph*, std::function<void (std::function<void (vg::Graph&)> const&)> const&)::{lambda(vg::Graph&)#1}::operator()(vg::Graph&) const
      Source "src/io/load_proto_to_graph.cpp", line 254, in operator() [0x55dd208c29bf]
#3    Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7fc9205a3fd5, in __assert_fail
      Source "/build/glibc-SzIz7B/glibc-2.31/assert/assert.c", line 101, in __assert_fail [0x7fc9205a3fd5]
#2    Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7fc920592728, in __assert_fail_base.cold
      Source "/build/glibc-SzIz7B/glibc-2.31/assert/assert.c", line 92, in __assert_fail_base [0x7fc920592728]
#1    Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7fc920592858, in abort
      Source "/build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c", line 79, in abort [0x7fc920592858]
#0    Object "/usr/lib/x86_64-linux-gnu/libc-2.31.so", at 0x7fc9205b300b, in raise
      Source "../sysdeps/unix/sysv/linux/raise.c", line 51, in raise [0x7fc9205b300b]
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!