Closed wsdewitt closed 4 years ago
Now I see where the clip happens (with max function) https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L218 But yeah, still got negatives. I'll try to come up with a minimal reproducible example.
I think this is behaving as expected, it's just that there's no reason to expect that the nonnegativity constraint will be met at every iterate. This is a problem for us because our Poisson loss can be undefined. In fact, we don't even have a guarantee that this algorithm converges:
We can still iterate this algorithm when the cost is undefined, and hope to eventually get to a region where the iterates stay in the nonnegative orthant. This seems to work sometimes, it's just a matter of getting to the happy place before hitting max_iter
.
There are 2 references for these kind of algorithms. Davis & Yin (2015, arxiv) is the one that originally introduced the method and contains more details than the 2017 journal article. The other one is Pedregosa & Gidel (2018, ICML) which guarantees that a formulation with line search will converge for convex costs. It is not the same as the line search method proposed by Davis & Yin, which you've quoted above.
Do you have a minimal example that throws negatives? Because I never observed it. There's probably a bug, then, but I'd like to track it down.
These papers aren't actually super clear about which variables to output as the result (x versus z) which could be the issue.
The code in utils.three_op_prox_grad_descent()
looked a lot like the Davis and Yin one.
Yes, I'll drop a MWE here in a bit, I'm setting up config file parsing for handling the parameters.
Yeah it's Davis & Yin. The first name is "Damek" I had the other one implemented as well, and both were behaving similarly.
So does the guarantee provided in Pedregosa & Gidel apply to the Davis & Yin algo we have?
No but I can re-implement their version easily enough.
Doing that now, stand by
Word. Note I committed to this branch recently, so you may need to pull.
(mushi) [kameron@fiddlehead mushi]$ git pull
Already up-to-date.
I am looking at the Pedregosa & Gidel paper. I believe there may be a typo in the algorithm and it should be returning z. Note that z in their algorithm is always nonneg in our formulation. Theorem 1 shows that the output of the algorithm (z,u) is an optimum for the primal-dual problem. What's weird is the rest of the results are stated in terms of the iterates x and averaged versions of it. It's kind of fishy.
Interesting, if you look at the poster that author Gidel made for this method, he makes a point that the iterates may not be feasible: http://fa.bianp.net/uploads/2018/AdaTOS_poster.pdf
I think you just carry on in that case
Running with z as the variable, the routine was exiting after only 3 or 4 iterates, the cost was not decreasing.
EDIT: This was a bug, now fixed.
we're encountering some pretty sharp edges 🔪 :hurtrealbad:
With 5133f1d4 you run mushi.py
with a config containing parameters. Here's a zip with two files: a 5-SFS and a config file that reproduces the issue of iterates leaving the nonnegative orthant. Run with this command:
python mushi.py 5-SFS.tsv 3mer.cfg output_basename
Ok I didn't test this but can tomorrow. You could try my new version and see if it works, using z as the output... z will never be negative, so we couldn't get any problems.
Working pretty well in test.ipynb
😛
It seems to be doing reasonable things on real data too, although convergence is rather slow. Do you think it's worthwhile to experiment with acceleration? Pedregosa & Gidel note:
So perhaps thar be dragons.
One thing we haven't done is tried averaging of the iterates, which is a basic acceleration scheme. The proof in P&G of convergence rates is for the average of the iterates suitably weighted by step sizes.
Cool, I could try copying the acceleration code in from the older Davis Yin 3op code, unless you're already playing.
Does it return good enough results for your slides? If so, let me play on it myself. If not, try implementing the running averaging in the P&G paper. You'll have to keep track of the numerator and denominator separately, I think.
I think I have what I need to make decent slides. Have at it if you like!
The branch
three-op-splitting
introduces a functionthree_op_prox_grad_descent()
intended to combine an arbitrary non-smooth term (and its associated proximal operator) with a non-negativity constraint.I get iterates that leave the nonnegative orthant. Looking at the code, I only see one proximal operator, but I expected there to be two (i.e. a second one performing the clip to the nonnegative orthant). I'm guessing there's something missing somewhere about here: https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L229-L231
I think the reference for the method should be Davis and Yin (2017). It currently reads https://github.com/harrispopgen/mushi/blob/012370a7bc848b72a4e805449ba0acedd6de5227/utils.py#L182