Max NJ Likelihood of Similarity Matrix

mattapow / dodonaphy

hyperbolic embeddings for variational approximation of phylogenetic posteriors

3 stars 0 forks source link

Max NJ Likelihood of Similarity Matrix #2

Closed mattapow closed 2 years ago

mattapow commented 2 years ago

As a subproblem of Dodonaphy:

initialise a similarity matrix with noise,
find the ML (or MAP) of it using BFGS

mattapow commented 2 years ago

When initialising the optimiser on ml.py line13, self.dists_data is supposedly a non-leaf tensor with grad_fn = UnbindBackward0.

mattapow commented 2 years ago

Implemented in ml.py. Invoke as e.g. dodo --infer ml --taxa 17 --epochs 1000 --learn 0.01

Likelihood is jumpy. Check autodiff graph with torchviz

mattapow commented 2 years ago

Here's the result gradient graph on ml.closure(): torchviz.gv.pdf Most of it is the peeler.nj() algorithm, and calculate-likelihood is at the end.

mattapow commented 2 years ago

The likelihood is jumpy because sometimes soft neighbour joining peeler.nj(pdm, 0.0001) outputs an incomplete tree (as seen in the saved .tree file). Further testing of test_soft_nj() required.

mattapow commented 2 years ago

Soft neighbour joining outputs complete tree. Likelihood is less jumpy. Questions:

Typically how much noise can we add to the dissimilarity matrix before we jump basins? Noise vs max likelihood.
Does using hydra with the true distances get into the basin of the global optimum?
Should we introduce a basin hopping (global algorithm with local maximisation)?
To get benchmarks, how does it compare to IQ-tree or RAxML, (as well as MrBayes)?
How does it compare to using pairwise similarities or Dasgupta's cost with Chami's geodesic connection?
Alternatives to NJ: family joining, ...

mattapow commented 2 years ago

Try annealing the likelihood.

mattapow commented 2 years ago

Use soft NJ to optimise distance matrix. Loss = minimum evolution. Then sample nearby trees, use Laplace approximation from torch.Hessian to get covariance for sampling.

mattapow commented 2 years ago

Not differentiable. Better to use embedding.