Stochastic procedure calling conventions

axch commented 7 years ago

An application of a stochastic procedure that makes a request can be considered, from the point of view of the trace in which that application is traced, as occurring in three stages:

Untraced computation before making the request (presumably computing what the request is)
Traced computation consisting of fulfilling the request
Untraced computation after the request is complete (presumably computing the overall answer based on the request)

In Mite, the SP interface and the runtime system are arranged to treat that whole sequence as a single operation -- it gets one node with one address (that of the application) and it is the unit of annotation (namely, with propsal and constraint kernels). This turns out to have some problems (detailed below).

In Lite and Puma, the SP interface and the runtime system are arranged to treat the three stages as distinct single operations -- each gets its own node (potentially many, in the case of the traced request), and the two untraced nodes are loci of annotation (e.g., with canAbsorb and logDensity methods). This has been known for a long time to have other problems (restated below).

The question will be, after the problems are listed, what is a good stochastic procedure calling convention?

What's wrong with the Mite way?

Essentially, the problem is that Mite leaves no way to name and annnotate the stages individually. Consider the simplest case of a requesting computation -- applying a compound procedure. What is to be done when inference changes one of the inputs? The reasonable thing to do seems to be to leave the first stage, and the structure of the request, untouched, propagate the change into the body of the request (which happens automatically in graph traces), and then resimulate the second stage if needed. [*] But there is no place to spell this -- both the proposal kernel and the constraint kernel of a compound SP are supposed to refer to the entire application, including the requested computation, so neither are appropriate. This is the cause of #654.

Another facet of this problem is the question of constraining compound procedures, and the distinction between lambda and proc. If the second stage could be annotated independently, it would be conceptually straightforward to annotate it as an invertible deterministic function, and teach the runtime system that it can backpropagate constraints to the outputs of those to be constraints to their inputs (with the appropriate Jacobian term for continuous ones, of course). [**] This is in effect what Lite does, except the whole implementation is hard-coded for the second stage of compound SPs specifically, instead of being triggered by a generally applicable annotation.

Status quo in Mite is that compound SPs have an extra kernel called a propagating kernel, and the scaffold construction algorithm has special code to check for it, whose job is to arrange to resimulate the second stage of the compound application in the event that inference changes the result of the request. However, if the arguments to the application change, the scaffold construction algorithm also selects the proposal kernel for the whole compound, which takes precedence and eventually causes a crash because of trying to double-extract the body of the request.

What's wrong with the Lite way?

Most fundamentally, the Lite way does not allow naming the application as a whole, in the event that one does wish to annotate all of it.

Also, the Lite way requires the runtime system, rather than the individual SPs, to make the decision about when a given request should be unevaluated. This would be fine if all the desired requesting stages were independent across applications, but we have two examples where they are not: memoized SPs, and the hypothetical Gaussian process with VentureScript covariance kernels.

The difficulty with memoized SPs is that every call (with the same argument) needs to refer to the same request, and that request needs to exist as long as any of those calls exists. In particular, the first application that needs it will make the request, but if inference decides to unevaluate that application, the request must be kept around if other applications need it. Lite accomplishes this with a custom "or" mechanism, namely caching requests by id and reference counting that cache.

The difficulty with VS covariance kernels is that the request for any given off-diagonal entry in the covariance matrix should only exist if both applications to the relevant points exist. Lite does not implement any "and" mechanism, and so we do not have VS covariance kernels.

Finally, Lite proper hard-codes the number of stages to exactly two. This is aesthetically unpleasant, contributes to the complexity in Lite's SP interface that has essentially prevented people from using it, is wasteful in the common case where only one stage is needed, and would be limiting if we encountered a circumstance where more than two stages were called for (which we have not, to date, perhaps because it can be worked around by returning SPs).

What can be done?

Trampoline

Lite's restriction to exactly two stages seems not too hard to remove by applying a trampoline. In this proposal, the SP calling convention changes to

Apply may return either a pair (request, continuation) or a final value (finality tag optional when unambiguous).
In the former case, the continuation must itself be a unary SP.

This option introduces some issues of its own, but mostly just brings the previous issues to the fore:

Should the runtime system automatically forward the original arguments to being arguments of the continuation, as Lite currently does?
- Con: more work, and more dependency edges (which are unnecessary in the common case of compound procedures).
- Pro: if the continuation wants to read the original arguments, there is no need to close over them and raise questions about whether the continuation "changes" when the arguments do.
- However: the requesting stage can arrage to forward the values through the result of the request.
Option: Can allow many requests at once (which makes the continuation n-ary).
Should annotations on the original SP object be interpreted to refer to the first stage only or to the whole sequence?
- Including the content of the requests?
- One style of answer here could be "Traced (sub-)computations cannot be annotated", or equivalently "All annotated computations are untraced." I wonder about this principle, but it could be taken as the defining feature of tracing.
Should the address of the original application refer to the first stage only or to the whole sequence?
This of course does nothing to address the more serious problem that SPs lose control over unevaluation and re-evaluation of requests.

More elaborate request manager

It would be possible to just implement an "and" mechanism for request handling. For example, the API for issuing a request can be made to consist of a 4-tuple: (request id, [application ids], expression, environment). The second field here is to be interpreted as a list of contingent application ids, such that this instance of requesting that request id should be removed from the reference count if any of those application ids are unevaluated. This should suffice for the needs of VS covariance kernels, at least.

Is this a good idea? I am of two minds. On the one hand, having three pieces of mechanism (requests, reference counting, contingency ids) for three examples (compounds, mem, GPs) is the classic sign of a weakness or limitation not having been removed from the substrate. On the other hand, one might argue that Boolean logic is complete, so it's really one mechanism and one should expect it to cover future examples.

More custom kernels

The only way I can think of to make progress starting from the Mite calling convention and not changing it is to add more custom kernel types, by analogy to the propagating kernel. To wit, I have, on a branch, implemented a notion of "request constraint kernel", whose meaning is "constrain the result of the first stage" (but say nothing about the request or the second stage). I have also added the special treatment this kernel calls for into the dependency trace scaffold construction algorithm.

Having done it, I like it even less than I did when I thought of it. What are the issues with this approach?

Looks like retrofitting the Mite calling convention to look like the Lite one, through the back door.
The kernel gets associated with the whole application's address.
If a propagating kernel is also needed, that also gets stored at the same address, so the scaffold needs to hold both of them [implemented].
The new kernel's extract method gets passed the output of the whole sequence, not the request that was emitted. This violates the intent, but doesn't cause trouble for compound SPs because the request is deterministic (and independent of the values of the arguments).
The new kernel's regen method is asked to return a value for the whole sequence. It needs to refuse, which requires mechanism [implemented].
The new kernel's regen method is not given a chance to change the request, at least not in any obvious way. This is OK for request constraint kernels, which are not supposed to change the request, but presumably be poor for request proposal kernels.
Since they are stored at the same address, there is no way to separate the execution of the kernel for the first stage from the kernel for the second stage. For compound SPs this is OK, because the kernel for the first stage is trivial and it doesn't matter that it gets executed at the wrong time.
There is no communication channel for first-stage kernels to pass values to second-stage kernels, should that be called for.

[*] Actually, this answer is an artifact of the implicit ref structure, namely that arguments to compounds are passed by reference through the environment, but return values are passed by value. It would not be impossible to change that to pass return values by reference as well, in which case both stages of a compound application can just be left be.

[**] The troublesome item here is backpropagating through a fanout (i.e., variable lookup) because for consistency that calls for forward propagating through all the other lookups. Both Lite and Mite have special facilities for this, which in Mite's case can well be described as a pretty awful hack.

axch commented 7 years ago

The branch https://github.com/probcomp/Venturecxx/tree/mite-request-constraint-kernels contains a (possibly incomplete) implementation of the "More custom kernels" proposal.

axch commented 7 years ago

It may also be feasible to let the SP choose. Suppose an SP could be equipped with a metaprogram (that was, let's say, deterministic and independent of the arguments) that gave the desired convention for calling and interpreting other annotations. Then some SPs could select trampoline and others could select to make callbacks directly.

It might also not be necessary to tag this: if the SP returns a request, that means it wants the trampoline, and if it makes the request but retains control, that means it wants to own it. Tagging has the advantage that the runtime system knows that this decision is deterministic and independent of the arguments. (In the case of the trampoline, that information may be recoverable by annotating the first stage, but a Mite-style SP has no other reasonable way to indicate that it will never want to use the trampoline.)

axch commented 7 years ago

Is it possible that it would suffice for all requesting stages to be deterministic and independent of the values of the inputs? This could be so if they can reliably offload all their interesting computation into the content of the request. Test cases:

memoized procedures
GPs with non-foreign covariance kernels

axch commented 7 years ago

Is it possible that it would suffice for all post-request stages to be just propagating the answer, by the same logic?

axch commented 7 years ago

Possible response: not if we want incremental migration of code out of the runtime system.

probcomp / Venturecxx