vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Issue in giraffe autoindex #4025

Closed Hendricks27 closed 1 year ago

Hendricks27 commented 1 year ago

1. What were you trying to do? Build giraffe index

2. What actually happened? I got an error

3. What data and command can the vg dev team use to make the problem happen? I tried home-built gfa and gfa downloaded from the paper "A draft human pangenome reference".

./vg autoindex -g ./x.gfa --prefix x --workflow giraffe -M 160G -t 20 > ./x.log 2>&1

4. What does running vg version say?

v1.49.0 "Peschici"

5. Output

[vg autoindex] Executing command: ./vg autoindex -g ./hprc.gfa --prefix hprc --workflow giraffe -M 160G -t 20
[IndexRegistry]: Checking for haplotype lines in GFA.
[IndexRegistry]: Constructing a GBZ from GFA input.
Error: Operation not permitted
warning:[IndexRegistry] Child process 10 failed with status 256 representing exit code 1
[IndexRegistry]: Exceeded GBWT insert buffer size, expanding and reattempting.
vg: src/index_registry.cpp:5198: std::vector<std::vector<std::__cxx11::basic_string<char> > > vg::IndexRegistry::execute_recipe(const RecipeName&, const vg:
:IndexingPlan*, vg::AliasGraph&): Assertion `input->is_finished()' failed.
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.49.0 "Peschici"
Stack trace (most recent call last):
#11   Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x615534, in _start
#10   Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x2056606, in __libc_start_main
#9    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x2054da9, in __libc_start_call_main
#8    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0xdf08ab, in vg::subcommand::Subcommand::operator()(int, char**) const
#7    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0xc6a10e, in main_autoindex(int, char**)
#6    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x10c46f5, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<cha
r, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&
)
#5    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x10b5e2f, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_s
tring<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std
::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGr
aph&)
#4    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x20671d5, in __assert_fail
#3    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x5e42f3, in __assert_fail_base.cold
#2    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x5e43cb, in abort
#1    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x206d7e5, in raise
#0    Object "/storage1/fs1/hprc/Active/wenjin/genome_graph/vg", at 0x209a1ec, in __pthread_kill
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━
jltsiren commented 1 year ago

This looks like a mmap() error. Maybe the vg process does not have the permissions to read the GFA file.

Try converting the GFA to GBZ manually with the following command:

vg gbwt -p --num-jobs 14 -g graph.gbz --gbz-format -G graph.gfa

It will probably fail for the same reason, but you will get error messages from the right process.

Hendricks27 commented 1 year ago

Thank you for your timely reply! I tried the command you used, the gbz graph was built successfully actually.

Building input GBWTs
Input type: GFA
Opening GFA file hprc.gfa
Validating GFA file hprc.gfa
Found 34407114 segments, 47280701 links, 0 paths, and 1379 walks in 16.6829 seconds
GBWT insertion batch size: 100000000 nodes
Parsing segments
Breaking segments into 1024 bp nodes
...
...
...
Finished job 14 in 12.3696 seconds
Finished job 15 in 13.7681 seconds
Finished job 16 in 10.9984 seconds
Finished job 19 in 8.54661 seconds
Finished job 21 in 7.4406 seconds
Finished job 18 in 10.2888 seconds
Finished job 17 in 11.9977 seconds
Finished job 20 in 9.88862 seconds
Finished job 1 in 31.8691 seconds
Finished job 0 in 33.0857 seconds
Merging partial indexes
Indexed 0 paths and 1379 walks in 45.1209 seconds
Parsing GFA header tags
Parsed header tags in 3.27453e-05 seconds
GBWTs built in 125.411 seconds, 32.6866 GiB

Building GBWTGraph
Saving GBWT and GBWTGraph to graph.gbz
GBWTGraph built in 48.0159 seconds, 32.6866 GiB
jltsiren commented 1 year ago

In that case, you can build the other indexes with:

vg index -j graph.dist graph.gbz
vg minimizer -p -t 16 -o graph.min -d graph.dist graph.gbz
Hendricks27 commented 1 year ago

Thank you so much! I will try it soon!

Hendricks27 commented 1 year ago

Thank you! It is resolved now.