nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 129 forks source link

ENH(tree): See whether we can add Usher and/or Maple as experimental tree builders for augur tree #1233

Open corneliusroemer opened 1 year ago

corneliusroemer commented 1 year ago

Context

IQtree can struggle with large trees and take long. We may want to experiment with using Usher and/or Maple as alternatives. They probably are significantly faster and may be good enough for some use cases, maybe even better than IQtree.

bqminh commented 4 months ago

We have also released CMAPLE tool, https://github.com/iqtree/cmaple, which is 3 times more efficient than MAPLE (https://doi.org/10.1101/2024.05.15.594295). Is there any plan to integrate such tool into the pipeline? We @trongnhanuit can volunteer to integrate CMAPLE.

huddlej commented 4 months ago

Thanks, @bqminh! Based on the IQ-TREE docs, it looks like Augur might support CMAPLE already by passing custom arguments to IQ-TREE like augur tree --tree-builder-args="--pathogen-force" (where IQ-TREE is already the default tree-builder). Another option to surface CMAPLE which would require a change to Augur would be to wrap that custom IQ-TREE command through a new "method" option like augur tree --method cmaple.

The other technical consideration is that we bundle IQ-TREE with Augur in our Nextstrain runtimes for Conda and Docker. For the Conda runtime, we pull the IQ-TREE package from Bioconda. For the Docker runtime, we download a (slightly out-of-date) binary from GitHub. For CMAPLE to work with augur tree across our various runtimes, we'd just need the Bioconda package and GitHub binaries to reflect the CMAPLE branch of the code. Separately, we are eager to include the latest version of IQ-TREE that supports ARM64 CPUs, but it looks like that development is happening in a separate branch from the CMAPLE work. Is there a plan to have a single release with both CMAPLE and ARM64 support or will these remain as separate development paths for a while?

bqminh commented 3 months ago

Thank you for this information! We'll prioritise to have this IQ-TREE/CMAPLE version work on ARM. It's good to know that it might work already with this tree builder arguments, but we'll also consider other options.

corneliusroemer commented 2 months ago

I've managed to build iqtree2+cmaple on my local machine (osx-arm64 macOS 14.6) with a few workarounds, see iqtree2 issue:

Per the logs, this time it really worked (I tried with bioconda version but that lacks cmaple support, see https://github.com/iqtree/iqtree2/issues/274)

IQ-TREE multicore version 2.3.5 for MacOS ARM 64-bit built Jul 17 2024
Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt,
Dominik Schrempf, Michael Woodhams, Ly Trong Nhan, Thomas Wong

Host:    dyn-3-4-29.mobile.unibas.ch (SSE4.2, 32 GB RAM)
Command: iqtree2 -ntmax 4 -s results/hmpxv1/masked_masked-delim.fasta -m GTR -ninit 2 -n 2 -me 0.05 -nt AUTO -redo --pathogen-force
Seed:    131082 (Using SPRNG - Scalable Parallel Random Number Generator)
Time:    Wed Jul 17 16:05:07 2024
Kernel:  SSE2 - auto-detect threads (10 CPU cores detected)

Reading an alignment
Running [C]MAPLE algorithm...
Performing placement
243 sequences have been added to the tree.
Applying a normal tree search
Optimizing branch lengths
Tree log likelihood: -272539.7511423723

MODEL: GTR

ROOT FREQUENCIES
A                       C                       G                       T                       
0.365181        0.157898        0.157473        0.319448        

MUTATION MATRIX
        A                       C                       G                       T                       
A       -2552.16        317.651 1864.83 369.68  
C       734.649 -5636.53        455.989 4445.89 
G       4324.56 457.223 -5524.77        742.987 
T       422.604 2197.54 366.257 -2986.4 

Analysis results written to:
Maximum-likelihood tree:       results/hmpxv1/masked_masked-delim.fasta.treefile
Screen log file:               results/hmpxv1/masked_masked-delim.fasta.log

CMAPLE Runtime: 0.9459710121s
Date and Time: Wed Jul 17 16:05:08 2024

On a small build (240 sequences, mpox clade IIb) things look good:

Brave Browser 2024-07-17 16 09 06

I'll keep exploring. I think I can edit the bioconda recipe to add cmaple so we can use it broadly across workflows. See:

corneliusroemer commented 2 months ago

I've managed to build iqtree with cmaple feature enabled in bioconda! There's thus no need to change augur code, one can simply pass the tree builder argument --pathogen-force and cmaple should be used automatically.k