Open axch opened 9 years ago
The first thing that comes to mind is demonstrating that Venture can express common non-Bayesian ML algorithms. For example, demonstrating a regularized logistic regression using IRLS / Newton's method, or a neural network using backpropagation, would be nice examples. For some of these, second-order information (Hessian) would be required. If this direction is taken, perhaps we should consider provisioning for the function to return more general side-information beyond just the gradient (e.g. Hessians).
The automatic differentiation system we have now cannot do nested AD, so Hessians would be an enormous amount of work. This issue is about packaging functionality we essentially already have in a convenient form rather than adding new functionality, but you're right that Hessians would be useful in principle.
HS-Venture should be able to compute Hessians.
The difficulty with this is that detach and regen both mutate the trace and the subproblem object, in such a way that the subproblem object is not reusable after a detach-regen cycle. The slow solution is probably to just re-select the subproblem every time; that may be an acceptable proof of concept.
So actually, it looks like the GradientOfRegen
object in hmc.py already implements the desired pattern in this case (actually regen-select-detach, on an already-detached state). Seems like it wouldn't be too difficult to expose that in the inference language as a proof-of-concept, separately from resolving the scaffold mess.
Yes. There is a choice: do we expose (select-detach-regen) or (select-detach), (regen-select-detach), (regen)? I no longer recall quite why I did it that way for hmc; perhaps it had to do with wanting the rhoDB from the first detach, and with wanting to do the last regen with fresh randomness. I suppose we could even expose both.
It wants to be regen-select-detach, because detach returns the gradient which is used to update the values to use for the next regen.
But select-detach-regen is much simpler to explain, and feels more natural externally. That means we should probably have both.
What would be the signature of select-detach-regen? regen-select-detach is pretty clearly (subproblem, values) -> (weight, gradient of weight), leaving the trace in the same state before and after. For select-detach-regen to fulfill the same use case, it would need to be a higher-order function that accepts an update function that produces new values to propose to regen given the current values and gradient. I guess that's not bad, although I would think it's less likely to work as well with external optimization packages that expect to be handed a function that they can call. (The other one would be a select-detach-regen that doesn't accept any values, so regen just proposes from the prior; not sure what there is to "package up" in that case though.)
The trouble with regen-select-detach :: (subproblem, values) -> (weight, gradient of weight) is that it (currently) mutates the input subproblem. But the package can fix that, e.g. by returning an updated one, or mutating it back.
The select-detach-regen variant I was thinking of would accept values that do not depend on the weight or the gradient: (subproblem-spec, values) -> (weight_ratio, gradient of weight at values). (Or maybe both weights rather than just the ratio).
It is becoming clearer that we should just make all of these packages, and see which ones lead to convenient uses.
But the gradient returned by detach is the gradient at the old values, not the new ones, unless select-detach-regen is actually select-detach-regen-detach-regen.
Interesting artifact I found in Marco's code: If the variables of interest are top-level, could use force
to set them and log_joint_at
to evaluate the posterior density. Doesn't give the gradient, though, and doesn't expose the "fixed randomness" trick.
For the record, my project notes from the initial implementation of the inference SPs select
, detach
, regen
, etc. This can be viewed as a sort of design document for the status quo.
Regen/Detach as inference SPs
Subgoals:
Initial limitations:
Would be nice if:
Imperative mh looks like this if the default regen is from the prior
(do (subproblem <- (select foo bar)) ; really, select by availability of log densities
((rho_weight, rho_db) <- (detach subproblem))
(xi_weight <- (regen subproblem))
(if (< (uniform ...) ...)
...
(do (detach subproblem)
(restore subproblem rho_db))))
With functional-underneath traces, we can have this
(do (subproblem <- (select foo bar))
(original <- (copy_trace))
(rho_weight <- (detach subproblem))
(xi_weight <- (regen subproblem))
(if (< (uniform ...) ...)
...
(set_trace original)))
A candidate for custom proposals
(do (subproblem <- (select foo bar))
(current_x <- ...)
((rho_weight, rho_db) <- (detach subproblem))
; somewhere need credit for the reverse proposal, rather than the prior
(new_x <- (normal current_x 1))
(correction <- ....)
set x to new x
(xi_weight <- (regen subproblem))
(if (< (uniform ...) ...)
...
(do (detach subproblem)
(regen/restore subproblem rho_db))))
The inference programming language currently has functions that expose the underlying trace API more or less the way it's implemented:
select :: scope -> block -> Action subproblem
detach :: subproblem -> Action (weight, rhoDB)
(leaves a torus and returns the likelihood of the old state)regen :: subproblem -> Action weight
(fills a torus and returns the likelihood of the new state)restore :: subproblem -> rhoDB -> Action weight
detach_for_proposal :: subproblem -> Action (weight, rhoDB)
(the weight is the full local posterior)regen_with_proposal :: subproblem -> [value] -> Action weight
(inserts given values into the principal nodes and returns the value of the full local posterior of the result)get_current_values :: subproblem -> Action [value]
This is enough to write things like mh with custom proposal distributions in the inference programming language; the tutorial even has an example. The next level, though, is to be able to write and debug gradient methods, or even better, use third-party gradient methods. Specific things that would be good to have:
detach_for_proposal
that returns the gradient of the weight wrt the values of the principal nodes (this should be easy) (name itdetach_for_proposal_with_gradient
?)FixedRandomness
object, but see #138 ).