test that gradient is 0 for joint inference in `validate_on_stripe82.jl`

jeff-regier commented 7 years ago

@agnusmaximus You can get the gradient for the unconstrained problem from the last_sf variable stored here:

https://github.com/jeff-regier/Celeste.jl/blob/master/src/deterministic_vi/maximize_elbo.jl#L36-L37

I think it's going to be easier to use that one directly, rather than map elbo.d back to the unconstrained parameterization.

jeff-regier commented 7 years ago

Actually it isn't so easy to access last_sf outside of the scope of maximize_f. You might just declare your own variable like last_sf in your tests, and then more-or-less duplicate lines 44 and 47 in your test: https://github.com/jeff-regier/Celeste.jl/blob/master/src/deterministic_vi/maximize_elbo.jl#L44-L47

On line 44, f is the function DeterministicVI.elbo. Line 47 loads the last_sf variable based on the contents of f_res, the SensitiveFloat returned by DeterministicVI.elbo. (The derivatives in f_res are for the constrained problem, the derivatives for last_sf are for the unconstrained problem.)

agnusmaximus commented 7 years ago

Sorry for the late reply, was a little out of it yesterday. Using the last_sf method like in maximize_f worked. On the 4 sources test case the gradients do get very close to zero.

I've just tried it out on the stripe 82 test case and for some sources the gradients are not zero, failing the check. The gradients do go closer to zero though with more iterations. I'm currently looking into how to improve this.

Maybe the algorithm does still need to be tweaked so that convergence happens with more iterations?

jeff-regier commented 7 years ago

You might try implementing the serial version of the algorithm that Cyclades parallelizes. The serial version of the algorithm could iterate through the light sources in order, repeatedly, calling maximize_f on each with max_iters=50 (or max_iters=100 even), terminating after 10 passes or so through all the light sources.

If that doesn't find a stationary point, something is probably off with the serial algorithm. On the other hand, if the serial version does make it to a 0 gradient, then I'd want to explore more why the Cyclades implementation is deviates from the output of the serial algorithm.

But that's just an idea, for if you're stuck on how to debug the non-zero gradient.

agnusmaximus commented 7 years ago

Ah that's a good idea, I'll try that

rgiordan commented 7 years ago

Sorry to jump in late with an obvious observation, but since I don't see it above I'll say it just in case: there are three different termination conditions that optim uses: size of the gradient, change in the function value, and change in the parameters themselves. Which condition terminated optimization is available in the optimization results, which Celeste returns. You can control the optimization termination conditions with function arguments. The trace (which is available with verbose on) will allow you to inspect all these values and make sure it's sensible.

On Fri, Dec 30, 2016 at 12:28 PM, Max notifications@github.com wrote:

Ah that's a good idea, I'll try that

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jeff-regier/Celeste.jl/issues/489#issuecomment-269818282, or mute the thread https://github.com/notifications/unsubscribe-auth/AESWPsjQioaT-crHvs0NpeK9yG6A30yDks5rNWlsgaJpZM4LTLTV .

agnusmaximus commented 7 years ago

Interesting, I didn't know that! I'll try poking around and see if most are terminating due to gradient being close to 0 and whether that matches what I'm seeing with the unconstrained gradient.

jeff-regier commented 7 years ago

That's worth keeping in mind though I think Max is seeing the gradient go to 0 with single infer, but not joint infer. If the issue where how Optim is called, I'd think it would affect both single and joint, though I can imagine exceptions to that.

agnusmaximus commented 7 years ago

Hmmm sorry about the confusion, the gradient of each source does get closer to 0 with joint infer, but some never really "reach" it (by reach I mean the norm(gradient of source i) < x where x is some constant. In my tests it's 1).

It does seem that joint infer gets closer to 0 than single infer (as indicated by its better objective value) though.

agnusmaximus commented 7 years ago

Ok, for the stripe82 test set, with no termination criteria, I ran for 1000 iterations (taking 24016.404752231 seconds ~ 6.6 hours with 8 threads), and the unconstrained gradient does get pretty much to 0 (within a norm of 1). I think with even more iterations it can get arbitrarily close to 0, but it just takes a super long time.

agnusmaximus commented 7 years ago

Should I even add this test to the long-running tests for joint_infer? It'll take > 7 hours with a single thread.

Also, does https://github.com/jeff-regier/Celeste.jl/issues/523 affect anything here?

jeff-regier commented 7 years ago

That's good hear, that it gets very close to 0 after enough iterations on a realistic test case. I feel confident now that the joint inference code is doing what we want. Thanks for validating the code. We should be able to claim now in the paper that we're finding a stationary point, at least to some precision. 7 hours is a long time even for the long running tests, so probably better not to include it---the existing unit/integration tests should suffice.

jeff-regier / Celeste.jl

test that gradient is 0 for joint inference in `validate_on_stripe82.jl` #489