pangenome / pggb

the pangenome graph builder
https://doi.org/10.1038/s41592-024-02430-3
MIT License
388 stars 44 forks source link

Command terminated by signal 4 #144

Closed trickovicmatija closed 2 years ago

trickovicmatija commented 2 years ago

Hey to all, Really nice tool, and I am pretty happy about the conda installation. Just a notice to everyone: I am trying to run this tool on kinda lot of Metagenome-Associated Genomes (MAGs). In the first second of running, within the wfmash tool, it just throws the "Illegal instruction" text and stops running. After some experimentation, at least in my case, it was related to out-of-memory problem (I was running it on cluster). When I provided more memory, it worked. Just keep in mind that could be a problem, even though it is not explicitly mentioned.

Best, Matija

AndreaGuarracino commented 2 years ago

Hi @trickovicmatija, thank you for sharing your experience.

Would you please share a bit more information:

ekg commented 2 years ago

This error indicates incompatibility in the binary build. In terms of solutions, I would suggest building the tools locally. Did you install via conda?

agolicz commented 2 years ago

Just wanted to report that I am also getting 'Command terminated by signal 4' after conda install.

Command terminated by signal 4 wfmash -X -s 100000 -p 95 -n 2 -t 40 input.fa input.fa 0.15s user 0.08s system 107% cpu 0.23s total 12224Kb max memory

I'll try a manual installation to see if that helps.

AndreaGuarracino commented 2 years ago

Unfortunately, the installation route with conda is currently not recommended. We are working on fixing it, but there are still compiling problems with one of the pggb components.

agolicz commented 2 years ago

Just confirming that manual install:

  1. wfmash, seqwish, smoothxg, odgi from source
  2. gfaffix with conda
  3. vg as binary

seems to work fine.

AndreaGuarracino commented 2 years ago

Hi @trickovicmatija and @agolicz, if you have time to spare, would you please try the conda installation again of wfmash/seqwish/smoothxg/odgi? I do not mean the conda installation of PGGB, but of its tools I've just mentioned.

I released a new version for each of them yesterday. Since they didn't work on your system before, feedback from you would be greatly appreciated.

agolicz commented 2 years ago

Hi, Sorry it took a bit I was away for Easter weekend. Trying on my system in the same environment:

source activate pggbt
conda install -c conda-forge -c bioconda wfmash
conda install -c conda-forge -c bioconda seqwish
conda install -c conda-forge -c bioconda smoothxg

Up till now installs ok

conda install -c conda-forge -c bioconda odgi
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.

ResolvePackageNotFound:
  - python=3.1

It looks like wfmash/seqwish/smoothxg now install correctly. Help info is printed as expected before I was getting 'core dumped' error (if I remember correctly) with wfmash.

ggautreau commented 2 years ago

Hi @AndreaGuarracino,

I still had the same issue as mention by @agolicz using pggb installed via conda (Command terminated by signal 4 at the wfmash step).

wfmash -h gives help info followed by "core dumped".

Best regards

AndreaGuarracino commented 2 years ago

This hurts. I had recently made some changes to avoid this type of problem. @ggautreau, could you please report much more information? How did you install wfmash? Are you sure about using the last current version on bioconda (0.9.1)?

ggautreau commented 2 years ago

Yes I used the last current version available on bioconda (yes 0.9.1), so wfmash was installed automatically as a dependency of pggb. I tried to install wfmash independently via conda afterward but there was no impact. I didn't try to install it manually as proposed by @agolicz.

Finally, I tried to install again pggb via mamba instead of conda and now the wfmash step of pggb works perfectly but I have this issue similar to the one discussed here #214.

[wfmash::map] Reference = [pggb/data/LPA/LPA.fa.gz]
[wfmash::map] Query = [pggb/data/LPA/LPA.fa.gz]
[wfmash::map] Kmer size = 19
[wfmash::map] Window size = 136
[wfmash::map] Segment length = 5000 (read split allowed)
[wfmash::map] Block length min = 25000
[wfmash::map] Chaining gap max = 100000
[wfmash::map] Percentage identity threshold = 90%
[wfmash::map] Skip self mappings
[wfmash::map] Mapping output file = output2/wfmash-P6TEhr
[wfmash::map] Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none)
[wfmash::map] Execution threads  = 1
[wfmash::skch::Sketch::build] minimizers picked from reference = 56563
[wfmash::skch::Sketch::index] unique minimizers = 3001
[wfmash::skch::Sketch::computeFreqHist] Frequency histogram of minimizers = (1, 156) ... (331, 1)
[wfmash::skch::Sketch::computeFreqHist] With threshold 0.001%, consider all minimizers during lookup.
[wfmash::map] time spent computing the reference index: 0.628583 sec
[wfmash::skch::Map::mapQuery] mapped 100.00% @ 1.02e+05 bp/s elapsed: 00:00:00:39 remain: 00:00:00:00
[wfmash::skch::Map::mapQuery] count of mapped reads = 8, reads qualified for mapping = 14, total input reads = 14, total input bp = 3984669
[wfmash::map] time spent mapping the query: 3.90e+01 sec
[wfmash::map] mapping results saved in: output2/wfmash-P6TEhr
[wfmash::align] Reference = [pggb/data/LPA/LPA.fa.gz]
[wfmash::align] Query = [pggb/data/LPA/LPA.fa.gz]
[wfmash::align] Mapping file = output2/wfmash-P6TEhr
[wfmash::align] Alignment identity cutoff = 7.20e-01%
[wfmash::align] Alignment output file = /dev/stdout
[wfmash::align] time spent loading the reference index: 8.24e-04 sec
[wfmash::align::computeAlignments] aligned 100.00% @ 3.17e+05 bp/s elapsed: 00:00:00:09 remain: 00:00:00:00
[wfmash::align::computeAlignments] count of mapped reads = 14, total aligned bp = 2852219
[wfmash::align] time spent computing the alignment: 9.00e+00 sec
[wfmash::align] alignment results saved in: /dev/stdout
wfmash -X -s 5000 -p 90 -n 1 -B output2 -t 1 pggb/data/LPA/LPA.fa.gz pggb/data/LPA/LPA.fa.gz
50.12s user 2.08s system 107% cpu 48.66s total 54612Kb max memory
[seqwish::seqidx] 0.001 indexing sequences
[seqwish::seqidx] 0.095 index built
[seqwish::alignments] 0.095 processing alignments
[seqwish::alignments] 0.167 indexing
[seqwish::alignments] 0.175 index built
[seqwish::transclosure] 0.182 computing transitive closures
[seqwish::transclosure] 0.194 0.00% 0-3984669 overlap_collect
[seqwish::transclosure] 0.364 0.00% 0-3984669 rank_build
[seqwish::transclosure] 0.460 0.00% 0-3984669 parallel_union_find
[seqwish::transclosure] 0.611 0.00% 0-3984669 dset_write
[seqwish::transclosure] 0.733 0.00% 0-3984669 dset_compression
[seqwish::transclosure] 0.782 0.00% 0-3984669 dset_sort
[seqwish::transclosure] 0.816 0.00% 0-3984669 dset_invert
[seqwish::transclosure] 0.854 0.00% 0-3984669 graph_emission
[seqwish::transclosure] 1.417 100.00% building node_iitree and path_iitree indexes
[seqwish::transclosure] 1.434 100.00% done
[seqwish::transclosure] 1.434 done with transitive closures
[seqwish::compact] 1.434 compacting nodes
[seqwish::compact] 1.439 done compacting
[seqwish::compact] 1.440 built node index
[seqwish::links] 1.440 finding graph links
[seqwish::links] 1.464 links derived
[seqwish::gfa] 1.464 writing graph
[seqwish::gfa] 1.652 done
seqwish -t 1 -s pggb/data/LPA/LPA.fa.gz -p output2/LPA.fa.gz.d3b273e.wfmash.paf -k 19 -f 0 -g output2/LPA.fa.gz.d3b273e.417fcdf.seqwish.gfa -B 10000000 --temp-dir output2 -P
2.26s user 0.24s system 150% cpu 1.66s total 326020Kb max memory
[smoothxg::main] loading graph
[smoothxg::main] prepping graph for smoothing
[odgi::gfa_to_handle] building nodes: 100.00% @ 1.97e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building edges: 100.00% @ 2.76e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::gfa_to_handle] building paths: 100.00% @ 5.47e+01/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::prep] building path index
[smoothxg::prep] sorting graph
[odgi::path_linear_sgd] calculating linear SGD schedule (2.58e-07 1.00e+00 100 0 1.00e-02)
[odgi::path_linear_sgd] calculating zetas for 1967 zipf distributions
[odgi::path_linear_sgd] 1D path-guided SGD: 100.00% @ 2.29e+06/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] grooming: 100.00% @ 1.98e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] organizing handles: 100.00% @ 1.98e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[odgi::groom] flipped 0 handles
[odgi::topological_order] sorting nodes: 100.00% @ 1.98e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::prep] chopping graph to 100
[odgi::chop] 1046 node(s) to chop.
[smoothxg::prep] writing graph output2/LPA.fa.gz.d3b273e.417fcdf.seqwish.gfa.prep.gfa
[smoothxg::main] building xg index
[smoothxg::smoothable_blocks] computing blocks
[smoothxg::smoothable_blocks] computing blocks for 18105 handles: 100.00% @ 3.62e+04/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_and_split_blocks] cutting blocks that contain sequences longer than max-poa-length (1400) and depth >= 0
[smoothxg::break_and_split_blocks] splitting 4229 blocks at identity 0.900 (WFA-based clustering) and at estimated-identity 0.900 (mash-based clustering)
[smoothxg::break_and_split_blocks] cutting and splitting 4229 blocks: 100.00% @ 8.36e+03/s elapsed: 00:00:00:00 remain: 00:00:00:00
[smoothxg::break_and_split_blocks] cut 0 blocks of which 0 had repeats
[smoothxg::break_and_split_blocks] split 0 blocks
[smoothxg::smooth_and_lace] applying local SPOA to 4229 blocks:  0.02% @ 7.95e+01/s elapsed: 00:00:00:00 remain: 00:00:00:53Command terminated by signal 4
smoothxg -t 1 -T 1 -g output2/LPA.fa.gz.d3b273e.417fcdf.seqwish.gfa -w 1400 -b output2 -X 100 -I .9000 -R 0 -j 0 -e 0 -l 700 -P 1,19,39,3,81,1 -O 0.001 -Y 200 -d 0 -D 0 -S -V -o output2/LPA.fa.gz.d3b273e.417fcdf.db7e83b.smooth.1.gfa
3.45s user 0.29s system 67% cpu 5.55s total 50316Kb max memory
AndreaGuarracino commented 2 years ago

@ggautreau, could you please report the info of the system where pggb doesn't work (CPU, operating system, etc...)?

ggautreau commented 2 years ago

Here is my info :

CPU : Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (16 cores, 32 threads) RAM : 326 Go OS : Linux XXXXX 5.4.204-1.el7.elrepo.x86_64 #1 SMP Tue Jul 5 16:32:13 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

AndreaGuarracino commented 2 years ago

Hi @ggautreau, thanks for the information. Would you please add --run-abpoa in the failing pggb run and check if it still fails?

ggautreau commented 2 years ago

Hi @AndreaGuarracino,

Using this flag, it perfectly works now :+1:

AndreaGuarracino commented 2 years ago

@ggautreau, thank you for your feedback! I would have a last request, if I may: could you please try the latest Docker/ image (from today, 08/17/2022) without the --run-abpoa flag? I hope the last fix I made solves the problem.

AndreaGuarracino commented 2 years ago

Feel free to open this again if the problem pop-ups again.