probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
28 stars 6 forks source link

Expose the local posterior function (and its gradient) (and its deterministic version) to the inference programming language as repeatably callable #199

Open axch opened 9 years ago

axch commented 9 years ago

The inference programming language currently has functions that expose the underlying trace API more or less the way it's implemented:

This is enough to write things like mh with custom proposal distributions in the inference programming language; the tutorial even has an example. The next level, though, is to be able to write and debug gradient methods, or even better, use third-party gradient methods. Specific things that would be good to have:

marcoct commented 9 years ago

The first thing that comes to mind is demonstrating that Venture can express common non-Bayesian ML algorithms. For example, demonstrating a regularized logistic regression using IRLS / Newton's method, or a neural network using backpropagation, would be nice examples. For some of these, second-order information (Hessian) would be required. If this direction is taken, perhaps we should consider provisioning for the function to return more general side-information beyond just the gradient (e.g. Hessians).

axch commented 9 years ago

The automatic differentiation system we have now cannot do nested AD, so Hessians would be an enormous amount of work. This issue is about packaging functionality we essentially already have in a convenient form rather than adding new functionality, but you're right that Hessians would be useful in principle.

axch commented 9 years ago

HS-Venture should be able to compute Hessians.

lenaqr commented 9 years ago

The difficulty with this is that detach and regen both mutate the trace and the subproblem object, in such a way that the subproblem object is not reusable after a detach-regen cycle. The slow solution is probably to just re-select the subproblem every time; that may be an acceptable proof of concept.

So actually, it looks like the GradientOfRegen object in hmc.py already implements the desired pattern in this case (actually regen-select-detach, on an already-detached state). Seems like it wouldn't be too difficult to expose that in the inference language as a proof-of-concept, separately from resolving the scaffold mess.

axch commented 9 years ago

Yes. There is a choice: do we expose (select-detach-regen) or (select-detach), (regen-select-detach), (regen)? I no longer recall quite why I did it that way for hmc; perhaps it had to do with wanting the rhoDB from the first detach, and with wanting to do the last regen with fresh randomness. I suppose we could even expose both.

lenaqr commented 9 years ago

It wants to be regen-select-detach, because detach returns the gradient which is used to update the values to use for the next regen.

axch commented 9 years ago

But select-detach-regen is much simpler to explain, and feels more natural externally. That means we should probably have both.

lenaqr commented 9 years ago

What would be the signature of select-detach-regen? regen-select-detach is pretty clearly (subproblem, values) -> (weight, gradient of weight), leaving the trace in the same state before and after. For select-detach-regen to fulfill the same use case, it would need to be a higher-order function that accepts an update function that produces new values to propose to regen given the current values and gradient. I guess that's not bad, although I would think it's less likely to work as well with external optimization packages that expect to be handed a function that they can call. (The other one would be a select-detach-regen that doesn't accept any values, so regen just proposes from the prior; not sure what there is to "package up" in that case though.)

axch commented 9 years ago

The trouble with regen-select-detach :: (subproblem, values) -> (weight, gradient of weight) is that it (currently) mutates the input subproblem. But the package can fix that, e.g. by returning an updated one, or mutating it back.

The select-detach-regen variant I was thinking of would accept values that do not depend on the weight or the gradient: (subproblem-spec, values) -> (weight_ratio, gradient of weight at values). (Or maybe both weights rather than just the ratio).

It is becoming clearer that we should just make all of these packages, and see which ones lead to convenient uses.

lenaqr commented 9 years ago

But the gradient returned by detach is the gradient at the old values, not the new ones, unless select-detach-regen is actually select-detach-regen-detach-regen.

axch commented 8 years ago

Interesting artifact I found in Marco's code: If the variables of interest are top-level, could use force to set them and log_joint_at to evaluate the posterior density. Doesn't give the gradient, though, and doesn't expose the "fixed randomness" trick.

axch commented 8 years ago

For the record, my project notes from the initial implementation of the inference SPs select, detach, regen, etc. This can be viewed as a sort of design document for the status quo.

Regen/Detach as inference SPs

Subgoals:

Initial limitations:

Would be nice if:

Imperative mh looks like this if the default regen is from the prior

(do (subproblem <- (select foo bar)) ; really, select by availability of log densities
    ((rho_weight, rho_db) <- (detach subproblem))
    (xi_weight <- (regen subproblem))
    (if (< (uniform ...) ...)
        ...
        (do (detach subproblem)
            (restore subproblem rho_db))))

With functional-underneath traces, we can have this

(do (subproblem <- (select foo bar))
    (original <- (copy_trace))
    (rho_weight <- (detach subproblem))
    (xi_weight <- (regen subproblem))
    (if (< (uniform ...) ...)
        ...
        (set_trace original)))

A candidate for custom proposals

(do (subproblem <- (select foo bar))
    (current_x <- ...)
    ((rho_weight, rho_db) <- (detach subproblem))
    ; somewhere need credit for the reverse proposal, rather than the prior
    (new_x <- (normal current_x 1))
    (correction <- ....)
    set x to new x
    (xi_weight <- (regen subproblem))
    (if (< (uniform ...) ...)
        ...
        (do (detach subproblem)
            (regen/restore subproblem rho_db))))