Cache-aware blocked matrix multiplication with detailed SIMD support for blocked (small) matrix operations
Improvements in SIMD implementations for several core "hot" routines (better memory management)
Re-factoring of several matrix-related functions to improve performance, especially by using compressedIndex sparse matrices.
Roughly 10-20% speedups across the board. More significant speedups for dense transition matrices (multi-hit models), ~25-50%.
Standard test suite
make test
2.5.44
hyphy-2.5.44 % make test
Running tests...
Test project /Users/sergei/Development/hyphy-2.5.44
Start 1: UNIT-TESTS
1/20 Test #1: UNIT-TESTS ....................... Passed 1.75 sec
Start 2: CODON
2/20 Test #2: CODON ............................ Passed 0.52 sec
Start 3: PROTEIN
3/20 Test #3: PROTEIN .......................... Passed 4.64 sec
Start 4: MTCODON
4/20 Test #4: MTCODON .......................... Passed 10.61 sec
Start 5: ALGAE
5/20 Test #5: ALGAE ............................ Passed 3.89 sec
Start 6: CILIATES
6/20 Test #6: CILIATES ......................... Passed 5.97 sec
Start 7: SLAC
7/20 Test #7: SLAC ............................. Passed 2.13 sec
Start 8: SLAC-PARTITIONED
8/20 Test #8: SLAC-PARTITIONED ................. Passed 7.08 sec
Start 9: FEL
9/20 Test #9: FEL .............................. Passed 9.13 sec
Start 10: MEME
10/20 Test #10: MEME ............................. Passed 23.28 sec
Start 11: MEME-PARTITIONED
11/20 Test #11: MEME-PARTITIONED ................. Passed 19.56 sec
Start 12: BUSTED
12/20 Test #12: BUSTED ........................... Passed 8.76 sec
Start 13: BUSTED-SRV
13/20 Test #13: BUSTED-SRV ....................... Passed 9.33 sec
Start 14: RELAX
14/20 Test #14: RELAX ............................ Passed 17.78 sec
Start 15: FUBAR
15/20 Test #15: FUBAR ............................ Passed 1.34 sec
Start 16: BGM
16/20 Test #16: BGM .............................. Passed 1.48 sec
Start 17: CONTRAST-FEL
17/20 Test #17: CONTRAST-FEL ..................... Passed 21.81 sec
Start 18: GARD
18/20 Test #18: GARD ............................. Passed 11.87 sec
Start 19: FADE
19/20 Test #19: FADE ............................. Passed 15.24 sec
Start 20: ABSREL
20/20 Test #20: ABSREL ........................... Passed 17.82 sec
100% tests passed, 0 tests failed out of 20
Total Test time (real) = 194.04 sec
2.5.45
make test
Running tests...
Test project /Users/sergei/Development/hyphy
Start 1: UNIT-TESTS
1/20 Test #1: UNIT-TESTS ....................... Passed 1.73 sec
Start 2: CODON
2/20 Test #2: CODON ............................ Passed 0.47 sec
Start 3: PROTEIN
3/20 Test #3: PROTEIN .......................... Passed 4.48 sec
Start 4: MTCODON
4/20 Test #4: MTCODON .......................... Passed 9.02 sec
Start 5: ALGAE
5/20 Test #5: ALGAE ............................ Passed 3.81 sec
Start 6: CILIATES
6/20 Test #6: CILIATES ......................... Passed 4.80 sec
Start 7: SLAC
7/20 Test #7: SLAC ............................. Passed 2.04 sec
Start 8: SLAC-PARTITIONED
8/20 Test #8: SLAC-PARTITIONED ................. Passed 6.72 sec
Start 9: FEL
9/20 Test #9: FEL .............................. Passed 7.34 sec
Start 10: MEME
10/20 Test #10: MEME ............................. Passed 19.20 sec
Start 11: MEME-PARTITIONED
11/20 Test #11: MEME-PARTITIONED ................. Passed 16.02 sec
Start 12: BUSTED
12/20 Test #12: BUSTED ........................... Passed 8.72 sec
Start 13: BUSTED-SRV
13/20 Test #13: BUSTED-SRV ....................... Passed 10.29 sec
Start 14: RELAX
14/20 Test #14: RELAX ............................ Passed 16.47 sec
Start 15: FUBAR
15/20 Test #15: FUBAR ............................ Passed 1.22 sec
Start 16: BGM
16/20 Test #16: BGM .............................. Passed 1.42 sec
Start 17: CONTRAST-FEL
17/20 Test #17: CONTRAST-FEL ..................... Passed 17.74 sec
Start 18: GARD
18/20 Test #18: GARD ............................. Passed 12.79 sec
Start 19: FADE
19/20 Test #19: FADE ............................. Passed 14.73 sec
Start 20: ABSREL
20/20 Test #20: ABSREL ........................... Passed 15.72 sec
100% tests passed, 0 tests failed out of 20
Total Test time (real) = 174.77 sec
FEL and MEME
2.5.44
$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex 55.40s user 3.14s system 452% cpu 12.949 total
$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/InfluenzaA.nex 822.79s user 33.89s system 476% cpu 2:59.77 total
2.5.45
$time mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex
mpirun -np 6 ./HYPHYMPI fel --alignment tests/data/bglobin.nex 45.20s user 2.79s system 454% cpu 10.554 total
$ time mpirun -np 6 ./HYPHYMPI meme --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
mpirun -np 6 ./HYPHYMPI meme --alignment 739.67s user 21.85s system 493% cpu 2:34.28 total
Multi-hit
2.5.44
$time ./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment 546.38s user 17.10s system 469% cpu 2:00.13 total
2.5.45
$time ./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment tests/data/yokoyama.rh1.cds.mod.1-990.nex
./hyphy ../hyphy-analyses/FitMultiModel/FitMultiModel.bf --alignment 382.47s user 15.97s system 428% cpu 1:32.93 total
Significant speed improvements via
compressedIndex
sparse matrices.Roughly 10-20% speedups across the board. More significant speedups for dense transition matrices (multi-hit models), ~25-50%.
Standard test suite
make test
2.5.44
2.5.45
FEL and MEME
2.5.44
2.5.45
Multi-hit
2.5.44
2.5.45