probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
29 stars 6 forks source link

Stochastic procedure calling conventions #653

Open axch opened 7 years ago

axch commented 7 years ago

An application of a stochastic procedure that makes a request can be considered, from the point of view of the trace in which that application is traced, as occurring in three stages:

In Mite, the SP interface and the runtime system are arranged to treat that whole sequence as a single operation -- it gets one node with one address (that of the application) and it is the unit of annotation (namely, with propsal and constraint kernels). This turns out to have some problems (detailed below).

In Lite and Puma, the SP interface and the runtime system are arranged to treat the three stages as distinct single operations -- each gets its own node (potentially many, in the case of the traced request), and the two untraced nodes are loci of annotation (e.g., with canAbsorb and logDensity methods). This has been known for a long time to have other problems (restated below).

The question will be, after the problems are listed, what is a good stochastic procedure calling convention?

What's wrong with the Mite way?

Essentially, the problem is that Mite leaves no way to name and annnotate the stages individually. Consider the simplest case of a requesting computation -- applying a compound procedure. What is to be done when inference changes one of the inputs? The reasonable thing to do seems to be to leave the first stage, and the structure of the request, untouched, propagate the change into the body of the request (which happens automatically in graph traces), and then resimulate the second stage if needed. [*] But there is no place to spell this -- both the proposal kernel and the constraint kernel of a compound SP are supposed to refer to the entire application, including the requested computation, so neither are appropriate. This is the cause of #654.

Another facet of this problem is the question of constraining compound procedures, and the distinction between lambda and proc. If the second stage could be annotated independently, it would be conceptually straightforward to annotate it as an invertible deterministic function, and teach the runtime system that it can backpropagate constraints to the outputs of those to be constraints to their inputs (with the appropriate Jacobian term for continuous ones, of course). [**] This is in effect what Lite does, except the whole implementation is hard-coded for the second stage of compound SPs specifically, instead of being triggered by a generally applicable annotation.

Status quo in Mite is that compound SPs have an extra kernel called a propagating kernel, and the scaffold construction algorithm has special code to check for it, whose job is to arrange to resimulate the second stage of the compound application in the event that inference changes the result of the request. However, if the arguments to the application change, the scaffold construction algorithm also selects the proposal kernel for the whole compound, which takes precedence and eventually causes a crash because of trying to double-extract the body of the request.

What's wrong with the Lite way?

Most fundamentally, the Lite way does not allow naming the application as a whole, in the event that one does wish to annotate all of it.

Also, the Lite way requires the runtime system, rather than the individual SPs, to make the decision about when a given request should be unevaluated. This would be fine if all the desired requesting stages were independent across applications, but we have two examples where they are not: memoized SPs, and the hypothetical Gaussian process with VentureScript covariance kernels.

The difficulty with memoized SPs is that every call (with the same argument) needs to refer to the same request, and that request needs to exist as long as any of those calls exists. In particular, the first application that needs it will make the request, but if inference decides to unevaluate that application, the request must be kept around if other applications need it. Lite accomplishes this with a custom "or" mechanism, namely caching requests by id and reference counting that cache.

The difficulty with VS covariance kernels is that the request for any given off-diagonal entry in the covariance matrix should only exist if both applications to the relevant points exist. Lite does not implement any "and" mechanism, and so we do not have VS covariance kernels.

Finally, Lite proper hard-codes the number of stages to exactly two. This is aesthetically unpleasant, contributes to the complexity in Lite's SP interface that has essentially prevented people from using it, is wasteful in the common case where only one stage is needed, and would be limiting if we encountered a circumstance where more than two stages were called for (which we have not, to date, perhaps because it can be worked around by returning SPs).

What can be done?

Trampoline

Lite's restriction to exactly two stages seems not too hard to remove by applying a trampoline. In this proposal, the SP calling convention changes to

This option introduces some issues of its own, but mostly just brings the previous issues to the fore:

More elaborate request manager

It would be possible to just implement an "and" mechanism for request handling. For example, the API for issuing a request can be made to consist of a 4-tuple: (request id, [application ids], expression, environment). The second field here is to be interpreted as a list of contingent application ids, such that this instance of requesting that request id should be removed from the reference count if any of those application ids are unevaluated. This should suffice for the needs of VS covariance kernels, at least.

Is this a good idea? I am of two minds. On the one hand, having three pieces of mechanism (requests, reference counting, contingency ids) for three examples (compounds, mem, GPs) is the classic sign of a weakness or limitation not having been removed from the substrate. On the other hand, one might argue that Boolean logic is complete, so it's really one mechanism and one should expect it to cover future examples.

More custom kernels

The only way I can think of to make progress starting from the Mite calling convention and not changing it is to add more custom kernel types, by analogy to the propagating kernel. To wit, I have, on a branch, implemented a notion of "request constraint kernel", whose meaning is "constrain the result of the first stage" (but say nothing about the request or the second stage). I have also added the special treatment this kernel calls for into the dependency trace scaffold construction algorithm.

Having done it, I like it even less than I did when I thought of it. What are the issues with this approach?

[*] Actually, this answer is an artifact of the implicit ref structure, namely that arguments to compounds are passed by reference through the environment, but return values are passed by value. It would not be impossible to change that to pass return values by reference as well, in which case both stages of a compound application can just be left be.

[**] The troublesome item here is backpropagating through a fanout (i.e., variable lookup) because for consistency that calls for forward propagating through all the other lookups. Both Lite and Mite have special facilities for this, which in Mite's case can well be described as a pretty awful hack.

axch commented 7 years ago

The branch https://github.com/probcomp/Venturecxx/tree/mite-request-constraint-kernels contains a (possibly incomplete) implementation of the "More custom kernels" proposal.

axch commented 7 years ago

It may also be feasible to let the SP choose. Suppose an SP could be equipped with a metaprogram (that was, let's say, deterministic and independent of the arguments) that gave the desired convention for calling and interpreting other annotations. Then some SPs could select trampoline and others could select to make callbacks directly.

It might also not be necessary to tag this: if the SP returns a request, that means it wants the trampoline, and if it makes the request but retains control, that means it wants to own it. Tagging has the advantage that the runtime system knows that this decision is deterministic and independent of the arguments. (In the case of the trampoline, that information may be recoverable by annotating the first stage, but a Mite-style SP has no other reasonable way to indicate that it will never want to use the trampoline.)

axch commented 7 years ago

Is it possible that it would suffice for all requesting stages to be deterministic and independent of the values of the inputs? This could be so if they can reliably offload all their interesting computation into the content of the request. Test cases:

axch commented 7 years ago

Is it possible that it would suffice for all post-request stages to be just propagating the answer, by the same logic?

axch commented 7 years ago

Possible response: not if we want incremental migration of code out of the runtime system.