nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

The default beta version of iqtree leads to segmetation fault #780

Closed george-githinji closed 2 years ago

george-githinji commented 2 years ago

Current Behavior
augur tree leads to a segmentation fault. It appears that installing augur from conda bundles a beta version of iqtree2 (iqtree-2.1.4_beta). this version segfaults with the following error:

ERROR: Shell exited from fatal signal SIGSEGV when running: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 10 -s results/filtered-delim.fasta -m GTR -ninit 10 -n 4 > results/filtered-delim.iqtree.log
Command output was:
  OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
  OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
  OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Possible solution

  1. install a stable version of iqtree2 by default
  2. Let the user install specific versions of iqtree2 rather than bundling it with the install.

Your environment: if browsing Nextstrain online

Your environment: if running Nextstrain locally

Additional context
Add any other context about the problem here.

tsibley commented 2 years ago

@george-githinji Thanks for the report. Hmm. We recently had another report of a segfault in IQ-TREE on a slightly earlier (and non-beta) version. That report led to this issue in IQ-TREE, but so far there's been no resolution. Perhaps you could contribute additional information (logs, input data, etc) that's been requested on the IQ-TREE issue so that the developers there might make progress?

george-githinji commented 2 years ago

Thank you @tsibley for the comments and suggestion.

huddlej commented 2 years ago

Hi @george-githinji, following up from an email I just sent you about this issue, here are my first thoughts for working around this issue (without getting specific help from IQ-TREE team). These are two versions of what you recommend in your possible solutions above.

  1. Downgrade IQ-TREE to an earlier version, in the hopes that the older version does not have the bug affecting your analysis. You can install the previous version from Bioconda (2.1.2) with conda install -c conda-forge -c bioconda iqtree=2.1.2. We actually use this slightly older version in both our ncov Conda environment and our Nextstrain Docker image.
  2. Upgrade IQ-TREE to the latest version, in the hopes that the bug has been fixed. There have been three additional IQ-TREE versions released since the one we use in the ncov workflow. None of these have been published to Bioconda yet, but you could manually install the latest version to see if it fixes the issue.

Our ncov installation instructions don't pin a specific version of IQ-TREE, so whenever you install augur from Conda, you will get the latest version of IQ-TREE. For our live Nextstrain builds, we've been using version 2.1.2 (from the Docker image mentioned above) and have not run into this issue. If you are able to downgrade to this version and it fixes the error, we could update our documentation to pin IQ-TREE to this more stable version.

george-githinji commented 2 years ago

Thanks @huddlej. Posting the details our our email discussion here:

Running the following augur generated command with IQ-Tree version 2.1.4

iqtree2 -s results/filtered-delim.fasta -ninit 2 -n 2 -me 0.05 -nt 16 -m GTR -ninit 10 -n 4

fails with the following error message

NOTE: 1181 MB RAM (1 GB) is required!
CHECKPOINT: Model parameters restored, LogL: -458015.529
Computing ML distances based on estimated model parameters...
Computing ML distances took 47.668815 sec (of wall-clock time) 703.398900 sec(of CPU time)
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Computing RapidNJ tree took 1.404447 sec (of wall-clock time) 14.600168 sec (of CPU time)
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
ERROR: ERROR:
ERROR:
STACK TRACE FOR DEBUGGING:
ERROR: *** IQ-TREE CRASHES WIERROR:
TH SIGNAL *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT***
IQ-TREE CRASHES WITH SIGNAL
ERROR: SEGMENTATION FAULT
ERROR: SEGMENTATION FAULT
ERROR:
ERROR:
ERROR:
ERROR: ***ERROR:
IQ-TREE CRASHES WITH SIGNAL ERROR: **** IQ-TREE SEGMC** IENTRAQ-TREE C
TIRON FAUASALTSHES
WITH SIGNAL HES WITH SIGNAL SEGMENTATION FAULT*** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT*** For bug report please send to develo*** IQ-TREE CRAS**Hp* ISEGQES WME
ITe-HTr SI
NTATRE
IsE CR*:O*N*ASHESG*N**  FAo rL  IbQu-gW
FAULIT TREE CRTrASHES H SEGMeportE WITHpSIGN SI
NTATGNAlease *s*eAL *nLION FASEdULT GME For b
ug rN tSEGMoe ENTdevepATorT*ATl** IOtIN F Qo-AU*** ITpRl
eIaOsNe  LsTe
Qndp*e*r*s :For*** EE  b-T RtEoE CdR
ASFH*eIu*vg e lCr
eR****ES WITH p  AFUoLrT
FAoorQopeSIG***    bugNrs Logr tr ep
SAHLEp:olease fil -TRbSe :E ug seEn dW  rC
I troT Hd St RASHES**evelopers**epIo*r*t* GN:* *p lWeIaTsHp lFo rF ob
uSeE G SIGMENTALresultea ssr*eFnAs/efiltered sde ntdo   bdueg vSeEloorp- *beu*N A Ldg*e**t*** orGMENTA**  s:*  lim.fast  LorTTg rereport eIO  pa  FoIpOoNr tF ApUlLT.l  r bug rog fi
devegLo
eaoprltpSLogg fliel:e Eseeo
Ne:   rFAre  stl prpefile:* *ssulULTlGMoEu
lltesa/sfei eNsaelnpers:eansddt sste/ f *itloe* t*o   treersueldt-**dser*ee*n dd- dse/fiT* For bug dAFor**  developeeli lilmt.efr absumtg.aerT
epL   rs:faoor .ld-delig to deIOsta.fliolge:  tr evoe
mseAelnidgpnlvelo.falomgN Fpresspers: ers:ulta.l
oetesaport pl
A
***nto deve t /ULf*** seTeag    Al
lopers:se  sfeilnilteignd sendmen
re
es (if p
t ftoiols*** deesv e lFope( sto**if***  dr***  pdoessviebl-sib: Lo*dellei)mlo.pfe
reg ) o sfi:****  leA a L*o*:l     A*
i g n
L**s tm o g   fiLloeg:l eintg gf iflae s  Lor
fi *nileg l(**  .r bug :  Lresmeesullfioe: n ol[1]    74707 segmentation fault  iqtree2 -s results/filtered-delim.fasta -ninit 2 -n 2 -me 0.05 -nt 16 -m GTR

However, removing the last command-line arguments (“-ninit 10 -n 4”) yields a success suggesting that the augur generated command string might be contributing to the error. Nonetheless this command requires substantial memory >17GB to run the model finder.

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 16.687 seconds
Perform fast likelihood tree search using GTR+I+G model...
Estimate model parameters (epsilon = 5.000)
Perform nearest neighbor interchange...
Estimate model parameters (epsilon = 1.000)
1. Initial log-likelihood: -129464.014
2. Current log-likelihood: -129459.210
3. Current log-likelihood: -129454.602
4. Current log-likelihood: -129450.146
5. Current log-likelihood: -129444.802
6. Current log-likelihood: -129440.633
7. Current log-likelihood: -129436.673
8. Current log-likelihood: -129431.885
9. Current log-likelihood: -129428.235
10. Current log-likelihood: -129424.822
11. Current log-likelihood: -129420.975
12. Current log-likelihood: -129417.610
13. Current log-likelihood: -129414.799
14. Current log-likelihood: -129411.370
15. Current log-likelihood: -129408.686
16. Current log-likelihood: -129406.224
17. Current log-likelihood: -129403.478
18. Current log-likelihood: -129401.144
19. Current log-likelihood: -129399.231
20. Current log-likelihood: -129396.974
21. Current log-likelihood: -129395.140
22. Current log-likelihood: -129393.591
23. Current log-likelihood: -129391.824
24. Current log-likelihood: -129390.344
25. Current log-likelihood: -129389.204
26. Current log-likelihood: -129388.148
27. Current log-likelihood: -129386.772
28. Current log-likelihood: -129385.749
Optimal log-likelihood: -129384.811
Rate parameters:  A-C: 0.13498  A-G: 0.62337  A-T: 0.10900  C-G: 0.11681  C-T: 2.19624  G-T: 1.00000
Base frequencies:  A: 0.299  C: 0.183  G: 0.196  T: 0.321
Proportion of invariable sites: 0.593
Gamma shape alpha: 0.808
Parameters optimization took 28 rounds (1813.677 sec)
Time for fast ML tree search: 2548.571 seconds

NOTE: ModelFinder requires 17243 MB RAM!
ERROR: Memory required exceeds your computer RAM size!

Then running the command below as you suggested to test whether the issue is caused by the redundant -ninit and -n flags or by the values of these flags themselves,

iqtree2 -s results/filtered-delim.fasta -ninit 10 -n 4 -me 0.05 -nt 16 -m GTR

Yields a successful execution of the command.

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 15.911 seconds
NOTE: 1724 MB RAM (1 GB) is required!
Estimate model parameters (epsilon = 0.500)
1. Initial log-likelihood: -138104.344
2. Current log-likelihood: -133482.216
3. Current log-likelihood: -133480.424
Optimal log-likelihood: -133480.359
Rate parameters:  A-C: 0.18760  A-G: 0.80959  A-T: 0.10121  C-G: 0.17670  C-T: 2.45449  G-T: 1.00000
Base frequencies:  A: 0.299  C: 0.183  G: 0.196  T: 0.321
Parameters optimization took 3 rounds (100.658 sec)
Computing ML distances based on estimated model parameters...
Computing ML distances took 45.558402 sec (of wall-clock time) 678.504236 sec(of CPU time)
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Computing RapidNJ tree took 1.402336 sec (of wall-clock time) 14.180043 sec (of CPU time)
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Log-likelihood of RapidNJ tree: -136868.296
--------------------------------------------------------------------
|             INITIALIZING CANDIDATE TREE SET                      |
--------------------------------------------------------------------
Generating 8 parsimony trees... 141.943 second
Computing log-likelihood of 8 initial trees ... 22.316 seconds
Current best score: -133345.871

Do NNI search on 4 best initial trees
Estimate model parameters (epsilon = 0.500)
BETTER TREE FOUND at iteration 1: -133097.147
Finish initializing candidate tree set (14)
Current best tree score: -133097.147 / CPU time: 558.016
Number of iterations: 4
TREE SEARCH COMPLETED AFTER 4 ITERATIONS / Time: 0h:13m:16s

--------------------------------------------------------------------
|                    FINALIZING TREE SEARCH                        |
--------------------------------------------------------------------
Performs final model parameters optimization
Estimate model parameters (epsilon = 0.050)
1. Initial log-likelihood: -133097.147
Optimal log-likelihood: -133097.128
Rate parameters:  A-C: 0.19109  A-G: 0.80774  A-T: 0.10135  C-G: 0.17720  C-T: 2.46728  G-T: 1.00000
Base frequencies:  A: 0.299  C: 0.183  G: 0.196  T: 0.321
Parameters optimization took 1 rounds (10.447 sec)
BEST SCORE FOUND : -133097.128
Total tree length: 0.315

Total number of iterations: 4
CPU time used for tree search: 6085.087 sec (1h:41m:25s)
Wall-clock time used for tree search: 558.411 sec (0h:9m:18s)
Total CPU time used: 9264.122 sec (2h:34m:24s)
Total wall-clock time used: 807.336 sec (0h:13m:27s)

Analysis results written to:
  IQ-TREE report:                filtered-delim.fasta.iqtree
  Maximum-likelihood tree:       filtered-delim.fasta.treefile
  Likelihood distances:          filtered-delim.fasta.mldist
  Screen log file:               filtered-delim.fasta.log

Date and Time: Thu Nov 18 22:10:01 2021

and digging a little in the commandline definition for Augur's iq-tree at https://github.com/nextstrain/augur/blob/master/augur/tree.py) I noted some comments at lines 164-174.

It appears that both the fast-optsargument and tree_builder_argsget parsed in lines 194-198 leading to a potentially degenerate argument for IQ-Tree.

huddlej commented 2 years ago

Thank you for sharing these details here, @george-githinji! Based on your testing, this issue appears to be caused by the same problem that led to #778. The same solution should address both of these issues.

george-githinji commented 2 years ago

Thank you @huddlej for picking this issue up. Your suggestion in #778 sounds the way forward but might need a consensus or a best practice approach.

jameshadfield commented 2 years ago

Here is another report of this bug from discussion.nextstrain.org