vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

vg genotype error in subset of chunks Assertion `vg.paths.has_path(path_name)' failed. #1399

Open wheaton5 opened 6 years ago

wheaton5 commented 6 years ago

So my vg genotype command is like

subprocess.check_call(['vg','genotype','-E','-v', args.vg, args.gam_index],stdout=tmp)

where the args.vg and the args.gam_index are the vg and gam.index for a given chunk output by vg chunk and vg index -t

subprocess.check_call(['vg','chunk','-t',str(args.__threads),'-n','250','-o',str(args.overlap),'-x',args.xg,'-a',gam_index,'-A','-g','-E',args.gam[0:-4]+".bed"])
outs.gams = sorted(glob.glob(directory+"/*.gam"))
for gam in outs.gams:
            subprocess.check_call(['vg','index','-t',str(args.__threads),'-d',gam+'.index','-N',gam])

so the genotyping step is failing in a 5 chunks out of 250 chunks. The other chunks are creating legit vcfs. Now I am testing this on a very downsampled data set with <1 average coverage so that might be an issue. But I did confirm that some of the vcfs created are not empty vcfs.

vg: path_index.cpp:254:

 vg::PathIndex::PathIndex(vg::VG&, const string&, bool): Assertion `vg.paths.has_path(path_name)' failed.
Got signal 6
Manual stack trace:
Stack trace from backtrace() for signal 6:
vg(vg::stacktrace_with_backtrace_and_exit(int)+0x27) [0x9e0927]
=================
vg(vg::emit_stacktrace(int, siginfo*, void*)+0x81) [0x9e0c71]
=================
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x2b446f26fcb0]
=================
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x2b44716be035]
=================
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x2b44716c179b]
=================
/lib/x86_64-linux-gnu/libc.so.6(+0x2ee1e) [0x2b44716b6e1e]
=================
/lib/x86_64-linux-gnu/libc.so.6(+0x2eec2) [0x2b44716b6ec2]
=================
vg(vg::PathIndex::PathIndex(vg::VG&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0xa7b) [0xae6edb]
=================
vg(vg::Genotyper::run(vg::AugmentedGraph&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, bool, bool, int, int)+0xe69) [0xa21549]
=================
vg(main_genotype(int, char**)+0x2366) [0x929c86]
=================
vg(vg::subcommand::Subcommand::operator()(int, char**) const+0x28) [0x98a578]
=================
vg(main+0xa7) [0x5ad6e7]
=================
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x2b44716a97ed]
=================
vg() [0x659901]
=================
wheaton5 commented 6 years ago

@glennhickey Any thoughts on this? I can provide the data for an example chunk that fails which should be pretty small.

wheaton5 commented 6 years ago

Update: chunking by region instead of by nodes fixed this problem

subprocess.check_call(['vg','chunk','-p',key,'-t',str(args.__threads),'-A','-c','5','-s','1000000','-o',str(args.overlap),'-x',args.xg,'-a',gam_index,'-g','-E',args.gam[0:-4]+".bed"])

where key is the chromosome

Still, if the chunking by nodes does not work for this, the reason should be known and documented.