harrispopgen / mushi

[mu]tation [s]pectrum [h]istory [i]nference
https://harrispopgen.github.io/mushi/
MIT License
24 stars 6 forks source link

three operator splitting allows negativity #37

Closed wsdewitt closed 4 years ago

wsdewitt commented 5 years ago

The branch three-op-splitting introduces a function three_op_prox_grad_descent() intended to combine an arbitrary non-smooth term (and its associated proximal operator) with a non-negativity constraint.

  1. I get iterates that leave the nonnegative orthant. Looking at the code, I only see one proximal operator, but I expected there to be two (i.e. a second one performing the clip to the nonnegative orthant). I'm guessing there's something missing somewhere about here: https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L229-L231

  2. I think the reference for the method should be Davis and Yin (2017). It currently reads https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L182

wsdewitt commented 5 years ago

Now I see where the clip happens (with max function) https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L218 But yeah, still got negatives. I'll try to come up with a minimal reproducible example.

wsdewitt commented 4 years ago

I think this is behaving as expected, it's just that there's no reason to expect that the nonnegativity constraint will be met at every iterate. This is a problem for us because our Poisson loss can be undefined. In fact, we don't even have a guarantee that this algorithm converges:

image

We can still iterate this algorithm when the cost is undefined, and hope to eventually get to a region where the iterates stay in the nonnegative orthant. This seems to work sometimes, it's just a matter of getting to the happy place before hitting max_iter.

kamdh commented 4 years ago

There are 2 references for these kind of algorithms. Davis & Yin (2015, arxiv) is the one that originally introduced the method and contains more details than the 2017 journal article. The other one is Pedregosa & Gidel (2018, ICML) which guarantees that a formulation with line search will converge for convex costs. It is not the same as the line search method proposed by Davis & Yin, which you've quoted above.

kamdh commented 4 years ago

Do you have a minimal example that throws negatives? Because I never observed it. There's probably a bug, then, but I'd like to track it down.

These papers aren't actually super clear about which variables to output as the result (x versus z) which could be the issue.

wsdewitt commented 4 years ago

The code in utils.three_op_prox_grad_descent() looked a lot like the Davis and Yin one.

Yes, I'll drop a MWE here in a bit, I'm setting up config file parsing for handling the parameters.

kamdh commented 4 years ago

Yeah it's Davis & Yin. The first name is "Damek" I had the other one implemented as well, and both were behaving similarly.

wsdewitt commented 4 years ago

So does the guarantee provided in Pedregosa & Gidel apply to the Davis & Yin algo we have?

kamdh commented 4 years ago

No but I can re-implement their version easily enough.

kamdh commented 4 years ago

Doing that now, stand by

wsdewitt commented 4 years ago

Word. Note I committed to this branch recently, so you may need to pull.

kamdh commented 4 years ago
(mushi) [kameron@fiddlehead mushi]$ git pull
Already up-to-date.
kamdh commented 4 years ago

I am looking at the Pedregosa & Gidel paper. I believe there may be a typo in the algorithm and it should be returning z. Note that z in their algorithm is always nonneg in our formulation. Theorem 1 shows that the output of the algorithm (z,u) is an optimum for the primal-dual problem. What's weird is the rest of the results are stated in terms of the iterates x and averaged versions of it. It's kind of fishy.

kamdh commented 4 years ago

Interesting, if you look at the poster that author Gidel made for this method, he makes a point that the iterates may not be feasible: http://fa.bianp.net/uploads/2018/AdaTOS_poster.pdf

I think you just carry on in that case

kamdh commented 4 years ago

Running with z as the variable, the routine was exiting after only 3 or 4 iterates, the cost was not decreasing.

EDIT: This was a bug, now fixed.

wsdewitt commented 4 years ago

we're encountering some pretty sharp edges 🔪 :hurtrealbad:

wsdewitt commented 4 years ago

minimal working example

With 5133f1d4 you run mushi.py with a config containing parameters. Here's a zip with two files: a 5-SFS and a config file that reproduces the issue of iterates leaving the nonnegative orthant. Run with this command:

python mushi.py 5-SFS.tsv 3mer.cfg output_basename
kamdh commented 4 years ago

Ok I didn't test this but can tomorrow. You could try my new version and see if it works, using z as the output... z will never be negative, so we couldn't get any problems.

wsdewitt commented 4 years ago

Working pretty well in test.ipynb 😛

image
wsdewitt commented 4 years ago

It seems to be doing reasonable things on real data too, although convergence is rather slow. Do you think it's worthwhile to experiment with acceleration? Pedregosa & Gidel note:

image

So perhaps thar be dragons.

kamdh commented 4 years ago

One thing we haven't done is tried averaging of the iterates, which is a basic acceleration scheme. The proof in P&G of convergence rates is for the average of the iterates suitably weighted by step sizes.

wsdewitt commented 4 years ago

Cool, I could try copying the acceleration code in from the older Davis Yin 3op code, unless you're already playing.

kamdh commented 4 years ago

Does it return good enough results for your slides? If so, let me play on it myself. If not, try implementing the running averaging in the P&G paper. You'll have to keep track of the numerator and denominator separately, I think.

wsdewitt commented 4 years ago

I think I have what I need to make decent slides. Have at it if you like!