vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

Error in 'vg autoindex' a GFA file derived from PGGB #3712

Closed fangbohao closed 1 year ago

fangbohao commented 2 years ago

1. What were you trying to do? I am trying to index a GFA graph file (a chromosome) derived from PGGB.

2. What did you want to happen? index done.

3. What actually happened? error message appears as above.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.41.0 "Salmour"
Stack trace (most recent call last):
#24   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5df43d, in _start
#23   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1e520cf, in __libc_start_main
#22   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5b08ce, in main
#21   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xd3347b, in vg::subcommand::Subcommand::operator()(int, char**) const
#20   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xc1237c, in main_autoindex(int, char**)
#19   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf41d48, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
#18   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2dde8, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std:
:allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, st
d::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGraph&)
#17   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2d7fd, in std::_Function_handler<std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc
ator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_tra
its<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (std::vector<vg::IndexFile const*, std::allocator
<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), vg::VGIn
dexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::b
asic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#15}>::_M_invoke(std::_Any_data const&, std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*
> > const&, vg::IndexingPlan const*&&, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char
, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
#16   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2d346, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile con
st*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_trai
ts<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#11}::operator()(std::vector<vg::IndexFile co
nst*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cx
x11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const 
[clone .isra.0]
#15   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1318ce0, in vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> > const&, handlegraph::MutablePathMutableHandleGraph*, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#14   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1316855, in vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> > const&, handlegraph::MutablePathMutableHandleGraph*, vg::algorithms::GFAIDMapInfo*, long)
#13   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x126cb90, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::f
unction<void (std::istream&)>)
#12   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131f4ac, in std::_Function_handler<void (std::istream&), vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<
char, std::char_traits<char>, std::allocator<char> > const&, handlegraph::MutablePathMutableHandleGraph*, vg::algorithms::GFAIDMapInfo*, long)::{lambda(std::istream&)#1}>::_M_invoke(std::
_Any_data const&, std::istream&)
#11   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131dfc9, in vg::algorithms::GFAParser::parse(std::istream&)
#10   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131c367, in vg::algorithms::GFAParser::parse(std::istream&)::{lambda()#3}::operator()() const
#9    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1318f12, in vg::algorithms::add_path_listeners(vg::algorithms::GFAParser&, handlegraph::MutablePathMutableHandleGraph*)::{lam
bda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_
traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::pair<__g
nu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_str
ing<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#2}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, st
d::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>
, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone
 .isra.0]
#8    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x17ee688, in handlegraph::PathMetadata::parse_path_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&, handlegraph::PathSense&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> >&, unsigned long&, unsigned long&, std::pair<unsigned long, unsigned long>&)
#7    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x17eea10, in long long __gnu_cxx::__stoa<long long, long long, char, int>(long long (*)(char const*, char**, int), char const*
, char const*, unsigned long*, int)
#6    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5af280, in std::__throw_invalid_argument(char const*)
#5    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8e148, in __cxa_throw
#4    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8dfe6, in std::terminate()
#3    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8df7b, in __cxxabiv1::__terminate(void (*)())
#2    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5ad45a, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#1    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5afdf7, in abort
#0    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x145d3ab, in raise

5. What data and command can the vg dev team use to make the problem happen?

6. What does running vg version say?

vg v1.41.0
fangbohao commented 2 years ago

Some big chromosomes work well with 'vg autoindex', but small chromosomes did not work properly, occurring issues above.

jeizenga commented 2 years ago

Can you provide the command line call that you ran into this error on?

fangbohao commented 2 years ago

Thanks for your reply. Here you go:

vg autoindex --workflow giraffe \ -g $gfa_chr37 -t 23 \ --target-mem 90G

On Thu, Aug 4, 2022 at 4:16 PM Jordan Eizenga @.***> wrote:

Can you provide the command line call that you ran into this error on?

— Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3712#issuecomment-1205724194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQTTOOCEDJMTOBEKTAT5BYTVXQQJFANCNFSM55TXCDCQ . You are receiving this because you authored the thread.Message ID: @.***>

fangbohao commented 2 years ago

By the way, here is the GFA file I used, which is 52MB, a small chromosome.

Please let me know if the GFA file is wrong or not properly produced.

Thank you! Bohao Fang

VGP#prim#SUPER_37.pan.fa.gz.3051141.04f1c29.ecb... https://drive.google.com/file/d/1nLpGPHSlZs4h1hmfuJHcI3hOyIFIDvXY/view?usp=drive_web

On Thu, Aug 4, 2022 at 4:59 PM Bohao Fang @.***> wrote:

Thanks for your reply. Here you go:

vg autoindex --workflow giraffe \ -g $gfa_chr37 -t 23 \ --target-mem 90G

On Thu, Aug 4, 2022 at 4:16 PM Jordan Eizenga @.***> wrote:

Can you provide the command line call that you ran into this error on?

— Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3712#issuecomment-1205724194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQTTOOCEDJMTOBEKTAT5BYTVXQQJFANCNFSM55TXCDCQ . You are receiving this because you authored the thread.Message ID: @.***>

jeizenga commented 2 years ago

@adamnovak This looks to me like it's running into a problem in the named-node stuff you implemented. Could you take a look?

ASLeonard commented 1 year ago

I came across this issue when using panSN-spec named input like

ARS_UCD12#hap0#6

but there is a stoll call on the haplotype, so should just be numeric (i.e. "ARS_UCD12#0#6"). Not sure if this was causing the same issue, but I got very similar crash log.

I couldn't find clear documentation on the pathsense API, but from vg paths -Mv it looks like it expects further groupings than panSN-spec? Is it possible to denote e.g. a primary assembly path vs a haplotype-resolved path or will everything need the sample ploidy to work?

Best, Alex

ASLeonard commented 1 year ago

Found the [path metadata model[(https://github.com/vgteam/vg/wiki/Path-Metadata-Model) (I knew I had stumbled on it before), so will try with this a bit further

adamnovak commented 1 year ago

Unfortunately I can't get @fangbohao's file; it looks like it's a Google Drive upload shared with a specific list of people that I'm not on.

But it does seem like a path like ARS_UCD12#hap0#6 might be able to cause a crash in __gnu_cxx::__stoa (which is the string-to-number converter) inside path name parsing.

By my reading of the panSN spec that I had when I wrote the path name parsing, that isn't valid panSN because the haplotype piece hap0 is a string; I thought only numbers were allowed there. Maybe that isn't really true?

Whether that's true or not, we should produce a more useful error when we can't parse the path name.

jeizenga commented 1 year ago

FWIW, the spec does indeed say here that haplotype ID is a number.

adamnovak commented 1 year ago

OK, @fangbohao shared the file with me, and I tested my fix, and I now have vg interpreting it like this:

[anovak@swords vg]% vg paths --metadata -x ~/Downloads/VGP\#prim\#SUPER_37.pan.fa.gz.3051141.04f1c29.ecbf8cf.smooth.final.gfa
#NAME   SENSE   SAMPLE  HAPLOTYPE   LOCUS   PHASE_BLOCK SUBRANGE
MA_2#hap2#h2tg000495l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    MA_2#hap2#h2tg000495l   NO_PHASE_BLOCK  NO_SUBRANGE
WA_2#hap1#h1tg000618l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    WA_2#hap1#h1tg000618l   NO_PHASE_BLOCK  NO_SUBRANGE
NM_1#hap2#h2tg000401l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    NM_1#hap2#h2tg000401l   NO_PHASE_BLOCK  NO_SUBRANGE
AZ_2#hap2#h2tg000020l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    AZ_2#hap2#h2tg000020l   NO_PHASE_BLOCK  NO_SUBRANGE
CA_1#hap1#h1tg001701l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    CA_1#hap1#h1tg001701l   NO_PHASE_BLOCK  NO_SUBRANGE
CA_1#hap2#h2tg004194l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    CA_1#hap2#h2tg004194l   NO_PHASE_BLOCK  NO_SUBRANGE
CA_2#hap2#h2tg002977l   GENERIC NO_SAMPLE_NAME  NO_HAPLOTYPE    CA_2#hap2#h2tg002977l   NO_PHASE_BLOCK  NO_SUBRANGE
...

It's not parsing it as the file writer intended, I don't think, but it is parsing it to something we can represent. For the file to really work properly (and not result in a possibly unmanageable number of named paths), hap1 and hap2 need to be changed to just 1 and 2. But with #4010 we should at least no longer crash like this.