Closed jeff-regier closed 7 years ago
Actually it isn't so easy to access last_sf
outside of the scope of maximize_f
. You might just declare your own variable like last_sf
in your tests, and then more-or-less duplicate lines 44 and 47 in your test:
https://github.com/jeff-regier/Celeste.jl/blob/master/src/deterministic_vi/maximize_elbo.jl#L44-L47
On line 44, f
is the function DeterministicVI.elbo
. Line 47 loads the last_sf
variable based on the contents of f_res
, the SensitiveFloat
returned by DeterministicVI.elbo
. (The derivatives in f_res
are for the constrained problem, the derivatives for last_sf
are for the unconstrained problem.)
Sorry for the late reply, was a little out of it yesterday. Using the last_sf method like in maximize_f worked. On the 4 sources test case the gradients do get very close to zero.
I've just tried it out on the stripe 82 test case and for some sources the gradients are not zero, failing the check. The gradients do go closer to zero though with more iterations. I'm currently looking into how to improve this.
Maybe the algorithm does still need to be tweaked so that convergence happens with more iterations?
You might try implementing the serial version of the algorithm that Cyclades parallelizes. The serial version of the algorithm could iterate through the light sources in order, repeatedly, calling maximize_f
on each with max_iters=50
(or max_iters=100
even), terminating after 10 passes or so through all the light sources.
If that doesn't find a stationary point, something is probably off with the serial algorithm. On the other hand, if the serial version does make it to a 0 gradient, then I'd want to explore more why the Cyclades implementation is deviates from the output of the serial algorithm.
But that's just an idea, for if you're stuck on how to debug the non-zero gradient.
Ah that's a good idea, I'll try that
Sorry to jump in late with an obvious observation, but since I don't see it above I'll say it just in case: there are three different termination conditions that optim uses: size of the gradient, change in the function value, and change in the parameters themselves. Which condition terminated optimization is available in the optimization results, which Celeste returns. You can control the optimization termination conditions with function arguments. The trace (which is available with verbose on) will allow you to inspect all these values and make sure it's sensible.
On Fri, Dec 30, 2016 at 12:28 PM, Max notifications@github.com wrote:
Ah that's a good idea, I'll try that
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jeff-regier/Celeste.jl/issues/489#issuecomment-269818282, or mute the thread https://github.com/notifications/unsubscribe-auth/AESWPsjQioaT-crHvs0NpeK9yG6A30yDks5rNWlsgaJpZM4LTLTV .
Interesting, I didn't know that! I'll try poking around and see if most are terminating due to gradient being close to 0 and whether that matches what I'm seeing with the unconstrained gradient.
That's worth keeping in mind though I think Max is seeing the gradient go to 0 with single infer, but not joint infer. If the issue where how Optim is called, I'd think it would affect both single and joint, though I can imagine exceptions to that.
Hmmm sorry about the confusion, the gradient of each source does get closer to 0 with joint infer, but some never really "reach" it (by reach I mean the norm(gradient of source i) < x where x is some constant. In my tests it's 1).
It does seem that joint infer gets closer to 0 than single infer (as indicated by its better objective value) though.
Ok, for the stripe82 test set, with no termination criteria, I ran for 1000 iterations (taking 24016.404752231 seconds ~ 6.6 hours with 8 threads), and the unconstrained gradient does get pretty much to 0 (within a norm of 1). I think with even more iterations it can get arbitrarily close to 0, but it just takes a super long time.
Should I even add this test to the long-running tests for joint_infer? It'll take > 7 hours with a single thread.
Also, does https://github.com/jeff-regier/Celeste.jl/issues/523 affect anything here?
That's good hear, that it gets very close to 0 after enough iterations on a realistic test case. I feel confident now that the joint inference code is doing what we want. Thanks for validating the code. We should be able to claim now in the paper that we're finding a stationary point, at least to some precision. 7 hours is a long time even for the long running tests, so probably better not to include it---the existing unit/integration tests should suffice.
@agnusmaximus You can get the gradient for the unconstrained problem from the
last_sf
variable stored here:https://github.com/jeff-regier/Celeste.jl/blob/master/src/deterministic_vi/maximize_elbo.jl#L36-L37
I think it's going to be easier to use that one directly, rather than map
elbo.d
back to the unconstrained parameterization.