Open huddlej opened 2 years ago
This needs testing - I ran this on a dataset today using tree-builder-args: "'-ninit 100 -n 40 -blmin 0.0000000000001'"
and it failed (leaving out -blmin
works, as expected).
Command output was:
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
ERROR: Numerical underflow (lh-derivative). Run again with the safe likelihood kernel via `-safe` option
ERROR: TREE BUILDING FAILED
Thanks for trying this out, @jameshadfield! I ran into the same issue with a Nebraska-focused tree (N=~1000 strains). Adding -safe
initially worked as expected, although IQ-TREE printed a bunch of warnings like:
WARNING: Numerical underflow for lh-derivative
But when I ran IQ-TREE again on the same data with the same parameters, it errored out like this:
ERROR: Shell exited from fatal signal SIGABRT when running: iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/Nebraska/aligned-delim.fasta -m GTR -ninit 10 -n 4 -blmin 0.0000000000001 -safe > results/Nebraska/aligned-delim.iqtree.log
Command output was:
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
OMP: Info #271: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
ERROR: phylokernelnew.h:3268: double PhyloTree::computeLikelihoodFromBufferSIMD() [VectorClass = Vec4d, nstates = 4, FMA = true, SITE_MODEL = false]: Assertion `std::isfinite(tree_lh) && "Numerical underflow for lh-from-buffer"' failed.
ERROR: STACK TRACE FOR DEBUGGING:
ERROR: 2 _sigtramp()
ERROR: 5 double PhyloTree::computeLikelihoodFromBufferSIMD<Vec4d, 4, true, false>()
ERROR: 6 PhyloTree::getBestNNIForBran(PhyloNode*, PhyloNode*, NNIMove*)
ERROR: 7 IQTree::evaluateNNIs(std::__1::map<int, std::__1::pair<Node*, Node*>, std::__1::less<int>, std::__1::allocator<std::__1::pair<int const, std::__1::pair<Node*, Node*> > > >&, std::__1::vector<NNIMove, std::__1::allocator<NNIMove> >&)
ERROR: 8 IQTree::optimizeNNI(bool)
ERROR: 9 IQTree::doNNISearch(bool)
ERROR: 10 IQTree::doTreeSearch()
ERROR: 11 runTreeReconstruction(Params&, IQTree*&)
ERROR: 12 runPhyloAnalysis(Params&, Checkpoint*)
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: *** Log file: results/Nebraska/aligned-delim.fasta.log
ERROR: *** Alignment files (if possible)
/bin/bash: line 1: 1825 Abort trap: 6 iqtree2 -ninit 2 -n 2 -me 0.05 -nt 4 -s results/Nebraska/aligned-delim.fasta -m GTR -ninit 10 -n 4 -blmin 0.0000000000001 -safe > results/Nebraska/aligned-delim.iqtree.log
There's clearly some stochastic behavior with that recommended value. I tried reducing the precision to 0.000000001
(from 13 to 9 decimal places) and that worked much more consistently. Interestingly, though, the run-time is widely variable with the same data; some runs take 35 seconds and others take 5 minutes to get to the same tree!
Here is a tangle-tree of the current params (left) vs. the min-length param above (right):
The two trees are qualitatively the same. There are a couple of additional mutations in the original tree that don't appear in the min-length tree. Here is a zoomed in view of a specific clade:
I plotted the number of mutations per branch for all nodes in the two trees.
There isn't much difference between the two trees. Maybe the difference would be more pronounced for much larger trees or if we were looking at the overall tree likelihoods, but I don't know that this change is worth it...
My suggestion here would be to set blmin
to 0.003/alignment_length
, which is about 0.000001
. That should have less effect on underflows etc, but should fix the problem because the largest number of consecutive 0-length branches is like less than 100.
FWIW, I stumbled on Rob's analysis that led him to suggest this very small minimum branch length. His context (analyzing 40,000+ sequences) and Nextstrain's (analyzing ~4,000 sequences) are pretty different. We might have higher priority issues to address right now than marginally optimizing the tree likelihoods...
Context
As part of a separate issue related to IQ-TREE and this workflow, we learned that the minimum branch length for IQ-TREE with SARS-CoV-2 data should be much lower—
-blmin 0.0000000000001
is suggested—than its default value.Rob Lanfear says:
Description
We should modify the default tree args in
defaults/parameters.yaml
to read:We might also want to test the run time and topology of the corresponding output compared to a tree built from the same data with a different min branch length parameter.