wsdewitt commented 3 years ago

Summary

API changes to improve usability, and accommodate demography-only analyses
Inference of ancestral state misidentification rate, for both sample frequency and mutation type. This obviates frequency masking, which has been removed.
More interpretable model selection based on trend penalties. A user can supply as many trend penalties as they wish. A trend penalty of order k will encourage order k polynomial pieces in the solution (e.g. k=0 for piecewise constant).

Note: ignore changes to files in the docs/ directory, this is the documentation build.

What to look at first

Inference for demography and mush is now done with two independent methods: kSFS.infer_eta and kSFS.infer_mush. The API docs on the new interface can be viewed by opening docs/index.html in your clone.
The easiest way to play is by trying out the Jupyter notebook docsrc/notebooks/simulation.ipynb, but be careful about committing your changes, because this notebook drives docs content.

Under the hood

Rewrite of mushi.optimization module with abstraction and inheritance to avoid a lot of duplicated code. Added a trend filtering optimizer class based on the recursive ADMM of Ramdas and Tibshirani (this serves as the prox operator in the outer optimization routine when fitting demography or mush). I was able to get this running quite fast by caching Cholesky decompositions and using the fast prox-tv module for dual variable updates. I find that about 20 iterations of ADMM are plenty (although the default is 100).
Consolidated SFS folding code into a function in the utils model
Introduced a frequency misidentification operator and a mutation type misidentification operator to our model of the expected kSFS, and misidentification rate r that is a learned parameter.
It is no longer necessary to specify the time grid for inference. If no grid is specified, a reasonable default grid is constructed based on the TMRCA under a constant history. To specify the grid, use the parameters pts and ta when running kSFS.infer_eta().
As before the time grid is logarithmically spaced. This seems to work well in practice (and I believe there is theoretical support for this choice). Trend penalties currently don't account for the variable time spacing in their difference operators, but this is something we could easily try.
Dedicated loss function model mushi.loss_functions. Inference can be done using any loss function from this module.
many little tweaks to prevent dumb things from happening or raise informative exceptions when they do.

wsdewitt commented 3 years ago

Apparently something in today's release of the Jax package is broken, which is why the integration tests above failed (and why you might be getting the same error if doing a fresh install) on import jax.numpy as np. https://github.com/google/jax/issues/5374

Workaround:

pip install jaxlib==0.1.57

kamdh commented 3 years ago

I'm not certain whether prox_tv is a needed import any more. Now that we have our own solver for trend filtering, I figure we can use it to solve for the total variation case as well, right? That would make the code more self-contained, and maybe allow for jit.

kamdh commented 3 years ago

This may be more of a feature request, but I think it makes sense to refactor the optimization code into its own module. This could end up being very useful for other people. We could also consider trying to get it pulled into some bigger python optimization library (not sure what at the moment).

wsdewitt commented 3 years ago

@kamdh

I'm not certain whether prox_tv is a needed import any more. Now that we have our own solver for trend filtering, I figure we can use it to solve for the total variation case as well, right? That would make the code more self-contained, and maybe allow for jit.

The magic in the special ADMM (from the paper cited in my docstring) is that it recurses to the 0th order case, and thus leverages those super fast solvers (like tv-prox). Boyd et al. have a trend filter ADMM that does not do this, and it is apparently slower, but I have not tried to implement it. It's possible that the non-recursive ADMM would be faster with JIT (and it would be nice to lose the dependency so we can upgrade python harrispopgen/mushi#59), but unfortunately the banded Cholesky solves aren't yet supported in JAX, and that's the most expensive part of the ADMM update.

wsdewitt commented 3 years ago

@kamdh

This may be more of a feature request, but I think it makes sense to refactor the optimization code into its own module. This could end up being very useful for other people. We could also consider trying to get it pulled into some bigger python optimization library (not sure what at the moment).

I totally agree, the optimization module is pretty impressive in its own right, and shouldn't be chained to this use case. We can spin it out with git filter-branch in a way that preserves commit history/contributions. Maybe we don't need to do that for this PR merge though, so want to open an issue for this and we can discuss separately (perhaps sometime after Feb 1)?

You might also want to play with the new (object oriented) optimization module

kamdh commented 3 years ago

I totally agree, the optimization module is pretty impressive in its own right, and shouldn't be chained to this use case. We can spin it out with git filter-branch in a way that preserves commit history/contributions. Maybe we don't need to do that for this PR merge though, so want to open an issue for this and we can discuss separately (perhaps sometime after Feb 1)?

Sounds like a plan to me.

harrispopgen / mushi

version 1 release candidate #68

Summary

What to look at first

Under the hood