veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

2.5.45 speed improvements #1552

Closed spond closed 1 year ago

spond commented 1 year ago

Significant speed improvements via

  1. Cache-aware blocked matrix multiplication with detailed SIMD support for blocked (small) matrix operations
  2. Improvements in SIMD implementations for several core "hot" routines (better memory management)
  3. Re-factoring of several matrix-related functions to improve performance, especially by using compressedIndex sparse matrices.

Roughly 10-20% speedups across the board. More significant speedups for dense transition matrices (multi-hit models), ~25-50%.

Standard test suite

make test

2.5.44

hyphy-2.5.44 % make test
Running tests...
Test project /Users/sergei/Development/hyphy-2.5.44
      Start  1: UNIT-TESTS
 1/20 Test  #1: UNIT-TESTS .......................   Passed    1.75 sec
      Start  2: CODON
 2/20 Test  #2: CODON ............................   Passed    0.52 sec
      Start  3: PROTEIN
 3/20 Test  #3: PROTEIN ..........................   Passed    4.64 sec
      Start  4: MTCODON
 4/20 Test  #4: MTCODON ..........................   Passed   10.61 sec
      Start  5: ALGAE
 5/20 Test  #5: ALGAE ............................   Passed    3.89 sec
      Start  6: CILIATES
 6/20 Test  #6: CILIATES .........................   Passed    5.97 sec
      Start  7: SLAC
 7/20 Test  #7: SLAC .............................   Passed    2.13 sec
      Start  8: SLAC-PARTITIONED
 8/20 Test  #8: SLAC-PARTITIONED .................   Passed    7.08 sec
      Start  9: FEL
 9/20 Test  #9: FEL ..............................   Passed    9.13 sec
      Start 10: MEME
10/20 Test #10: MEME .............................   Passed   23.28 sec
      Start 11: MEME-PARTITIONED
11/20 Test #11: MEME-PARTITIONED .................   Passed   19.56 sec
      Start 12: BUSTED
12/20 Test #12: BUSTED ...........................   Passed    8.76 sec
      Start 13: BUSTED-SRV
13/20 Test #13: BUSTED-SRV .......................   Passed    9.33 sec
      Start 14: RELAX
14/20 Test #14: RELAX ............................   Passed   17.78 sec
      Start 15: FUBAR
15/20 Test #15: FUBAR ............................   Passed    1.34 sec
      Start 16: BGM
16/20 Test #16: BGM ..............................   Passed    1.48 sec
      Start 17: CONTRAST-FEL
17/20 Test #17: CONTRAST-FEL .....................   Passed   21.81 sec
      Start 18: GARD
18/20 Test #18: GARD .............................   Passed   11.87 sec
      Start 19: FADE
19/20 Test #19: FADE .............................   Passed   15.24 sec
      Start 20: ABSREL
20/20 Test #20: ABSREL ...........................   Passed   17.82 sec

100% tests passed, 0 tests failed out of 20

Total Test time (real) = 194.04 sec

2.5.45

make test
Running tests...
Test project /Users/sergei/Development/hyphy
      Start  1: UNIT-TESTS
 1/20 Test  #1: UNIT-TESTS .......................   Passed    1.73 sec
      Start  2: CODON
 2/20 Test  #2: CODON ............................   Passed    0.47 sec
      Start  3: PROTEIN
 3/20 Test  #3: PROTEIN ..........................   Passed    4.48 sec
      Start  4: MTCODON
 4/20 Test  #4: MTCODON ..........................   Passed    9.02 sec
      Start  5: ALGAE
 5/20 Test  #5: ALGAE ............................   Passed    3.81 sec
      Start  6: CILIATES
 6/20 Test  #6: CILIATES .........................   Passed    4.80 sec
      Start  7: SLAC
 7/20 Test  #7: SLAC .............................   Passed    2.04 sec
      Start  8: SLAC-PARTITIONED
 8/20 Test  #8: SLAC-PARTITIONED .................   Passed    6.72 sec
      Start  9: FEL
 9/20 Test  #9: FEL ..............................   Passed    7.34 sec
      Start 10: MEME
10/20 Test #10: MEME .............................   Passed   19.20 sec
      Start 11: MEME-PARTITIONED
11/20 Test #11: MEME-PARTITIONED .................   Passed   16.02 sec
      Start 12: BUSTED
12/20 Test #12: BUSTED ...........................   Passed    8.72 sec
      Start 13: BUSTED-SRV
13/20 Test #13: BUSTED-SRV .......................   Passed   10.29 sec
      Start 14: RELAX
14/20 Test #14: RELAX ............................   Passed   16.47 sec
      Start 15: FUBAR
15/20 Test #15: FUBAR ............................   Passed    1.22 sec
      Start 16: BGM
16/20 Test #16: BGM ..............................   Passed    1.42 sec
      Start 17: CONTRAST-FEL
17/20 Test #17: CONTRAST-FEL .....................   Passed   17.74 sec
      Start 18: GARD
18/20 Test #18: GARD .............................   Passed   12.79 sec
      Start 19: FADE
19/20 Test #19: FADE .............................   Passed   14.73 sec
      Start 20: ABSREL
20/20 Test #20: ABSREL ...........................   Passed   15.72 sec

100% tests passed, 0 tests failed out of 20

Total Test time (real) = 174.77 sec

FEL and MEME

2.5.44

$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex  55.40s user 3.14s system 452% cpu 12.949 total

$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/InfluenzaA.nex  822.79s user 33.89s system 476% cpu 2:59.77 total

2.5.45

$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex  45.20s user 2.79s system 454% cpu 10.554 total

$ time mpirun -np 6 ./HYPHYMPI meme --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
mpirun -np 6 ./HYPHYMPI meme --alignment   739.67s user 21.85s system 493% cpu 2:34.28 total

Multi-hit

2.5.44

$time ./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment   546.38s user 17.10s system 469% cpu 2:00.13 total

2.5.45

$time ./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment   382.47s user 15.97s system 428% cpu 1:32.93 total