Closed ddarriba closed 8 years ago
The local/neigboring branch length BFGS would be really useful for several projects and for my curiosity. Also it would be interesting to see how BFGS performs with derivatives versus the standard gradient approach.
for testing purposes
a) Parallel Newton-Raphson optimization
Instead of iteratively optimize and apply the branch lengths, optimize each of them and store the optimized value (proposal) in a buffer. Once all branches have been optimized, applied them and perform a full traversal.
The advantage of this approach is that branches can be optimized in parallel and in an arbitrary order, leading always to the same branch lengths (reproducible result regardless how they are computed). There is no need to update CLVs/scalers between branches, so the optimization of the set of branches in a single core is faster for each iteration. However, the convergence of the branch length optimization process will probably take a longer time, and perhaps it is prone to get stuck in local optima easier, or to end with a lower global likelihood.
b) Multiple-branch BFGS
Optimize simultaneously small subsets of branches (e.g., 5 to 10) using BFGS. Instead of approximating the gradients, they can be calculated using the first derivative.