Closed wsdewitt closed 4 years ago
Three operator splitting (combining rank and total variation penalties) is converging very fast. A complete optimization of both demography and mutation spectrum history on a grid of 200 epochs takes about 25 seconds for CEU (n=198).
(mushi) ➜ 1KG git:(aitchison-geometry) ✗ time python mushi.py CEU.3-SFS.tsv masked_size.tsv 3mer.cfg mushi
sequential inference of η(t) and μ(t)
inferring η(t)
initial cost -1.033239e+08
iteration 272, cost -1.036994e+08
relative change in objective function 4.8e-11 is within tolerance 1e-10 after 272 iterations
inferring μ(t) conditioned on η(t)
initial cost -6.428418e+07
iteration 152, cost -6.429320e+07
relative change in objective function 4.4e-11 is within tolerance 1e-10 after 152 iterations
python ../mushi.py scons_output/CEU/3-SFS.tsv scons_output/masked_size.tsv 50.85s user 3.75s system 206% cpu 26.472 total
Was the only change you made to replace the clip with a prox? If so, that code should be ok.
Question: When you do this log-ratio transform thing, are you doing it for the Z variables, i.e. mutation rates? Then there's an overall mutation rate parameter that multiplies them? Or is the "scale" set in just the pop size?
Was the only change you made to replace the clip with a prox? If so, that code should be ok.
Yeah just replaced the clip with an arbitrary prox. Should we reconsider the difference from the 3-operator paper noted here? Was this to ensure the iterates were always non-negative, or is it correcting a typo in their algo?
Question: When you do this log-ratio transform thing, are you doing it for the Z variables, i.e. mutation rates? Then there's an overall mutation rate parameter that multiplies them? Or is the "scale" set in just the pop size?
Yes. There is an overall rate which we input as a free parameter. We use a recent estimate of de novo mutation rate from trio sequencing, where you compare mom, dad, and kid genomes (Turner et al.). The optimization can only modify compositions such that the total mutation rate is equal to this parameter. You can think of it as part of the "scale" built into the population size, but if we want to get time measured in generation (and then years), we need an independent estimate of total mutation rate.
This branch formulates the MUSH optimization problem using compositional data analysis for hard constraint to the simplex (constant total mutation rate).
The magic
@kharris, I've attempted to generalize the 3-operator splitting routine to an arbitrary pair of non-smooth terms and their prox operators (previously it assumed one of them was non-negativity). Please take a look at the new function if you can. The line where the second prox gets used replaces the previous non-negativity clip: https://github.com/harrispopgen/mushi/blob/4d579f3e78cb89ac22e168c413f0b6a8f6ffde1a/utils.py#L243-L244