Aitchison geometry - Githubissues

wsdewitt commented 4 years ago

This branch formulates the MUSH optimization problem using compositional data analysis for hard constraint to the simplex (constant total mutation rate).

The magic

We avoid identifiability issues where total mutation rate is modified to fit the total SFS better. Instead all the optimization goes into fitting the relative rates.
I use the isometric log-ratio transform (with Gram-Schmidt orthogonalization of the simplex) to transform compositions to a real-valued vector space (losing one degree of freedom). Optimization is done in this space. The answer here has a nice walkthrough of how this works and why it's needed.
In the transformed space we no longer need to constrain optimization to the non-negative orthant (we use the inverse transform to evaluate the Poisson loss). This enables us to use the accelerated proximal gradient code for all the previous regularization functionality (either tv or rank regularization), which is blazing fast!
Without the non-negativity constraint, we can consider using 3-operator splitting to combine rank and tv proximal operators.

@kharris, I've attempted to generalize the 3-operator splitting routine to an arbitrary pair of non-smooth terms and their prox operators (previously it assumed one of them was non-negativity). Please take a look at the new function if you can. The line where the second prox gets used replaces the previous non-negativity clip: https://github.com/harrispopgen/mushi/blob/4d579f3e78cb89ac22e168c413f0b6a8f6ffde1a/utils.py#L243-L244

wsdewitt commented 4 years ago

Three operator splitting (combining rank and total variation penalties) is converging very fast. A complete optimization of both demography and mutation spectrum history on a grid of 200 epochs takes about 25 seconds for CEU (n=198).

(mushi) ➜  1KG git:(aitchison-geometry) ✗ time python mushi.py CEU.3-SFS.tsv masked_size.tsv 3mer.cfg mushi

sequential inference of η(t) and μ(t)

inferring η(t)
initial cost -1.033239e+08
iteration 272, cost -1.036994e+08        
relative change in objective function 4.8e-11 is within tolerance 1e-10 after 272 iterations
inferring μ(t) conditioned on η(t)
initial cost -6.428418e+07
iteration 152, cost -6.429320e+07        
relative change in objective function 4.4e-11 is within tolerance 1e-10 after 152 iterations
python ../mushi.py scons_output/CEU/3-SFS.tsv scons_output/masked_size.tsv    50.85s user 3.75s system 206% cpu 26.472 total

kamdh commented 4 years ago

Was the only change you made to replace the clip with a prox? If so, that code should be ok.

kamdh commented 4 years ago

Question: When you do this log-ratio transform thing, are you doing it for the Z variables, i.e. mutation rates? Then there's an overall mutation rate parameter that multiplies them? Or is the "scale" set in just the pop size?

wsdewitt commented 4 years ago

Was the only change you made to replace the clip with a prox? If so, that code should be ok.

Yeah just replaced the clip with an arbitrary prox. Should we reconsider the difference from the 3-operator paper noted here? Was this to ensure the iterates were always non-negative, or is it correcting a typo in their algo?

Question: When you do this log-ratio transform thing, are you doing it for the Z variables, i.e. mutation rates? Then there's an overall mutation rate parameter that multiplies them? Or is the "scale" set in just the pop size?

Yes. There is an overall rate which we input as a free parameter. We use a recent estimate of de novo mutation rate from trio sequencing, where you compare mom, dad, and kid genomes (Turner et al.). The optimization can only modify compositions such that the total mutation rate is equal to this parameter. You can think of it as part of the "scale" built into the population size, but if we want to get time measured in generation (and then years), we need an independent estimate of total mutation rate.

harrispopgen / mushi

Aitchison geometry #46

The magic