probcomp / Gen.jl

A general-purpose probabilistic programming system with programmable inference
https://gen.dev
Apache License 2.0
1.79k stars 160 forks source link

Proposal: Allow a Generative Function to recommend a proposal for conditioning on its return value #277

Open bzinberg opened 4 years ago

bzinberg commented 4 years ago

Motivation

This has come up repeatedly when I try to add slack terms or noise to a model. Even if I'm the only one developing the code, this "adding" is cumbersome because the inference program always needs to change when the model changes. For example, suppose I am co-developing a model and inference algorithm that include the model subroutine widget, defined by

@gen function widget()
  x ~ my_distribution()
  return x
end

and the inference program includes conditioning on :x, i.e. observing (widget_call => :x, val) in the choicemap, for some val.

Once that version is debugged, suppose I want to add measurement noise to the model:

@gen function noisy_widget()
  true_x ~ my_distribution()
  measured_x ~ normal(true_x, noise_stdev)
end

Now the inference program has to be changed. If we want to observe the return value of noisy_widget, the first obvious thing to do is observe (:measured_x, val). But if val is far from the modes of true_x, then the following systematic error will occur: in the original trace, true_x is drawn from some distribution (say, the prior) that doesn't know about val. The value of true_x will usually be near its prior modes. Because true_x is so far off, conditioning directly on :measured_x will not make much useful progress toward the posterior, unless a user who knows about this problem (i.e. is familiar with the source of noisy_widget) supplies a custom proposal.

Now imagine we're in a multi-person software ecosystem, and widget and noisy_widget are part of a library. In principle, the author of noisy_widget could have good recommendations about how to condition on its return value, without requiring the user to grok the trace structure (which the library author may later want to change under the hood). In this case the strategy would be something like "set true_x to be pretty close to the prior mode, but a offset little bit towards val, where 'a little bit' is on the order of noise_stdev." A user who needs finer-tuned performance will still have to go deeper into the math and write a homemade algorithm, but by having a good tune-free default, we allow users to be less encumbered when prototyping, or when developing more complex programs that are too big to hand-maintain every component of the inference algorithm.

Also, if the author doesn't supply a hand-made default, then all inference code that does user-written logic to condition on the return value could break if the author later changes under-the-hood details (trace structure) of the model. This effectively makes libraries not libraries: the user has to read the source code in order to use it.

Proposal sketch

"Wait, is that a sketch of the proposal, or a sketch of the proposal?"

The rough idea is to allow a GenerativeFunction author to supply a default proposal for conditioning on the function's return value:

proposal = Gen.condition_retval(f)
new_trace, accepted = Gen.mh(trace, proposal, (val,))

Note that this feature is modular, in the sense that these condition_retval proposals will themselves be able to call other condition_retval proposals for their subroutine calls.

The default condition_retval proposal, if none is supplied by the author, could be:

(Conceivably, we could make this a library authors' best-practice rather than part of the API: we could decree a naming convention such as "library authors should, for each generative function f, also supply a generative function called f_condition_retval." My current speculative thinking is that having it eventually be part of the API would be neater.)

alex-lew commented 4 years ago

@bzinberg Generative functions already have internal proposals, not for their return values, but for any choicemap. It is essentially any distribution for proposing some addresses of a generative function given other addresses. The default is to use the "intervened prior", which as you point out, is often not a very good proposal. But generative functions can be customized to use other proposals as their internal proposals. In this case, the generative function methods generate and regenerate automatically use these internal proposals, and return weights that reflect those other proposals. This way, you can just use mh(p, select(:true_x)) or importance_resampling(p, choicemap(:true_x => val)) and benefit from the internal proposal without even realizing it :-) Furthermore, the "default proposals" of all other generative functions delegate to the "default proposals" of the generative functions they call, so callers of p automatically benefit.

Right now, there is a lot of boilerplate necessary to specify a custom internal proposal, but @marcoct's thesis suggests a combinator for doing so -- p = ReplaceInternalProposal(p, q). This combinator wouldn't be too difficult to implement. (Marco may already have it implemented somewhere?) Another point in the design space is MetaPPL, which allows you to alter a generative function's internal proposal using pattern-matching syntax:

@rules f(x, y) do t
  (:a | :b, :c) => propose_a(t[:b], t[:c], x)
  (:z | :a) => propose_z(t[:a])
end

This says: when the generative function f is called with constraints t and arguments x and y, first check if t has :b and :c observed but not :a; if so, use propose_a to propose :a. Then check if :a is observed but not :z, and if so, use propose_z to propose :z. (This might be overkill for Gen.)

bzinberg commented 4 years ago

But generative functions can be customized to use other proposals as their internal proposals.

:exploding_head: I did not know this! :exploding_head: I thought "internal proposal" was Gen's formal name for the intervened prior. That's great that users can supply a custom proposal distribution -- that greatly increases the extent to which a library written by someone else can be usable (at a decent basic level of performance) without the user having to become well-versed in the internals. (Do we have example implementations of GFs with custom proposal distributions?)

I think the ability for the author to say how to condition on the return value would help seal the deal. The internal proposal distribution gives reasonably-performant inference as long as the user has constraints in the format of the GF's choicemap, but in this case the user still has to know how the choicemap is structured, and to update their code whenever that structure changes in new versions of the library. It seems like just as a key piece of Gen's modelling modularity is that subroutine calls only affect the body of the current function by way of their return value -- the implementation of the subroutine can be viewed as a black box for purposes of sampling from the prior -- it would be very helpful to also have an analogous inference modularity, where it is possible to reasonably-performantly condition on the return value of the subroutine while treating the subroutine's implementation as a black box.

Assuming the generative function has a custom internal proposal distribution, the above default implementation,

  • If the return value is itself a random choice, condition on that.
  • If the return value is the output of a generative function call, condition on it by using the callee's condition_retval.
  • Otherwise, throw an error saying you don't know how to condition on a the output of a deterministic function.

makes sense except possibly for the third case. It does seem like fundamentally we can't both allow the "last line" of a GF to be an arbitrary deterministic function and guarantee a reasonable default way to condition on the return value, but maybe that's ok.

alex-lew commented 4 years ago

@bzinberg

I think having a condition_retval function would be a bit strange: in general, the retval is not something you can condition on, so it reads as a type mismatch. Many of the generative functions I’ve written while working with Gen do not allow the user to directly condition the return value.

If a particular library is committed to the idea that its generative functions will provide this functionality, then it could use a common trace address for the return values of its generative functions, something like :retval. If the return value is a tuple of multiple choices, such that some amount of “inversion logic” is necessary to translate the return value into constraints on other choices, that logic can live in the generative function’s internal proposal (and :retval can still be in the trace).

I also wonder if your particular vision here is too specific to MH: why does condition_retval return a proposal generative function? How do we use this with importance resampling, where we need to provide constraints as a choicemap anyway?

As a broader point, in general, a generative function’s choicemap structure is part of its interface. We may think of some of its variables as “nuisance variables” that we don’t want to think about outside the GF, but not all the variables are nuisance variables, from an inference programming perspective. The question of how to achieve “modular inference” is an interesting one, but is difficult to solve: internal proposals are a step in the right direction, but very often, we need information to flow across generative function boundaries. It is commonly the case that the return values of intermediate generative functions are not directly observed. For example, imagine an inverse graphics application in which the model has the form

scene ~ generate_scene()
image ~ render_scene(scene)
return image

We may have a neural network that looks at the image, and proposes specific variables within scene based on the image. That inference programmer needs to know the names of those variables inside scene.

If we want to “declutter” the choicemap interface of a particular GF, removing nuisance variables that can be handled “by default” and exposing only those variables that users are expected to need to reference, Gen’s framework also allows for this: the ReplaceInternalProposal combinator could take as an argument a selection of addresses to “hide” from the resulting generative function. Formally, these variables would become untraced randomness in the new GF, handled internally by whatever proposal the user provided.

bzinberg commented 4 years ago

I think having a condition_retval function would be a bit strange: in general, the retval is not something you can condition on, so it reads as a type mismatch. Many of the generative functions I’ve written while working with Gen do not allow the user to directly condition the return value.

AFAICS, conditioning on the return value is always mathematically well-defined, but Gen does not promise to be able to do it. Is that wrong?

If the return value is a tuple of multiple choices, such that some amount of “inversion logic” is necessary to translate the return value into constraints on other choices, that logic can live in the generative function’s internal proposal (and :retval can still be in the trace).

Totally. IIUC, this doesn't cover the general case where the return value is an arbitrary deterministic function of the traced random choices -- and I don't think we should cover that case. Perhaps the method could be unimplemented by default.

How do we use this with importance resampling, where we need to provide constraints as a choicemap anyway?

I might be way off on this, but -- details of the plumbing aside -- could we allow the user to pass return value constraints in addition to choicemap constraints, with the meaning "for each of these subroutine calls, splice in a call to condition_retval when running the larger proposal"? The constraints choicemap for the larger proposal would not be allowed to talk about addresses inside the subroutine call if that subroutine call has a retval constraint.

As a broader point, in general, a generative function’s choicemap structure is part of its interface.

I agree that that is how Gen is currently framed. I'm trying to see if there are use cases in which the model author can allow users to not worry about that structure if they're willing to stick with the untuned-but-decent condition_retval (even if at a later stage they end up doing custom inference).

The question of how to achieve “modular inference” is an interesting one, but is difficult to solve: internal proposals are a step in the right direction, but very often, we need information to flow across generative function boundaries.

Totally agreed. I am not saying that all inference programs will use condition_retval. And in almost every case, the best inference program will not use condition_retval at all. I agree that this lack of decomposability in general is a fundamental aspect of the whole field. But modular approximations to the hard problems (e.g., graphical models) have been useful. My intuition is that having a better-than-nothing way to do inference "some-batteries-included" on a coarse but compositional model that treats some of its generative function calls as atomic (like Distributions) could break through a fundamental scaling limitation: the user can get some automatic help exploring complex models that incorporate other people's generative functions as components.

It is commonly the case that the return values of intermediate generative functions are not directly observed. For example, imagine an inverse graphics application in which the model has the form

scene ~ generate_scene()
image ~ render_scene(scene)
return image

We may have a neural network that looks at the image, and proposes specific variables within scene based on the image. That inference programmer needs to know the names of those variables inside scene.

Yes, absolutely. We should fully expect that users who know how to do good customized inference will ignore condition_retval and do something better; and that even users who don't want to develop custom inference that looks at subroutine choicemaps in more detail will usually still have to do it to some degree (though perhaps they can do useful initial exploration without it).

Thanks for providing the inverse graphics example -- good point that we would typically not want to condition on the return value of generate_scene. I'll chew on that and try to incorporate that class of use cases into my thinking.

To give you a sense of why I think sometimes a generative function author might want to allow users to ignore the choicemap structure, consider a generative function like

@gen function random_orientation()
  yaw ~ ...
  pitch ~ ...
  roll ~ ...
  return yprToQuaternion(yaw, pitch, roll)
end

where the author may later want to change it to something like

@gen function random_orientation()
  w ~ ...
  x ~ ...
  y ~ ...
  z ~ ...
  return quat(w, x, y, z)  # "w + x*i + y*j + z*k"
end

Formally, these variables would become untraced randomness in the new GF, handled internally by whatever proposal the user provided.

I don't want to remove the user's ability to do custom inference on the sub-choicemap -- I just want the user to be allowed to, at least in early stages, use library code that doesn't require custom inference at that level of granularity.

alex-lew commented 4 years ago

AFAICS, conditioning on the return value is always mathematically well-defined, but Gen does not promise to be able to do it. Is that wrong?

Perhaps you could find a way to define it. But it's not obvious. Is conditioning on x - y equaling 0 the same as conditioning on x/y equaling 1? Gen's semantics give us a way to condition on values in choicemaps, but not return values, and so has little to say about this question.

For example, suppose we write:

@gen function f()
  x ~ normal(0, 1)
  return 2x
end

and someone later refactors it as

@gen function f()
  x ~ normal(0, 2)
  return x
end

Should "conditioning on the return value" of these two functions do the same thing? If so, how would you achieve that effect? (It would require changing the way that generative function traces are scored.)

These questions arise for your quaternion examples too. If you want users to be able to directly condition on a quaternion, one way you can do this is by implementing a distribution on quaternions, rather than a generative function. Such a distribution would need to handle the 'change of variables' you do when moving from pitch/roll/yaw to quaternion, adjusting the density according to the change-of-variables formula; this is the step that would be missing if you simply re-expressed your observation in terms of yaw, pitch, and roll. (When observing continuous values, the units make a difference.)

Perhaps the method could be unimplemented by default. In this case, perhaps it could just be a method that you find useful to implement on several generative functions that you write? This would not require system-wide support.

I might be way off on this, but -- details of the plumbing aside -- could we allow the user to pass return value constraints in addition to choicemap constraints, with the meaning "for each of these subroutine calls, splice in a call to condition_retval when running the larger proposal"? The constraints choicemap for the larger proposal would not be allowed to talk about addresses inside the subroutine call if that subroutine call has a retval constraint.

It seems to me you are proposing adding a lot of machinery to make the return value behave in the same way that traced choices already do.

So I wonder: instead of adding a new method, condition_retval, to the GFI -- which will only sometimes be implemented, and has no default implementation -- why not instead just trace the return value at a common address like :retval? It is still part of the interface of a generative function whether or not it exposes this special retval address, just like it would be part of the interface, under your proposal, whether or not it implements condition_retval.

(There is a separate question of whether it might make sense to allow users to trace at an "empty address" -- I wonder if @georgematheos's work on "ValueChoiceMaps" have anything to say about that. As it stands, Gen choicemaps cannot store choices and subtraces at the same address. Metaprob's traces can do this. Doing so might be a way of getting at the same thing you're asking for: the "blank" address can be the designated "return value" choice. You still have to know whether a generative function traces at the blank address, but if it does, you can constrain that slot.)

bzinberg commented 4 years ago

(When observing continuous values, the units make a difference.)

Instead of "well-defined," I should have said "as well-defined as it is for choicemaps."

Gen's semantics give us a way to condition on values in choicemaps, but not return values, and so has little to say about this question.

I did not realize this. As a small example, does Gen give a mathematical definition of what "observing" the choicemap [(:y, 0.1)] means in the model

@gen function f()
  x ~ normal(0, 1)
  y ~ normal(x, 1)
end

in terms of conditioning on an event in a probability space? Or is there a disintegration-based semantics? I haven't seen Gen answer the question of what it means by conditioning on choicemaps of probability zero, so I thought I was not introducing more non-well-definedness.

Also, the name condition_retval was bad -- it doesn't claim to actually be sampling from a conditional distribution. Instead, it is supposed to give you a better-than-prior proposal for when you are trying to do inference (on the choicemap) given the information that the return value of the subroutine was some particular value. It does not have to be a sampler from the true (marginal) posterior on that subroutine's chiocemap.

It seems to me you are proposing adding a lot of machinery to make the return value behave in the same way that traced choices already do.

I think we had a misunderstanding on that. I meant the phrase "details of the plumbing aside" pretty broadly -- certainly adding return values to the choicemap and therefore not having to add arguments to any API methods would be an economical way to implement the thing I'm proposing, if we come to the decision that the conceptual capability is desirable. I was brushing aside that question because it sounds like you don't think the conceptual capability would be desirable even if the cost in terms of implementation and API churn was zero. I'm trying to have that discussion first because if the capability is incoherent or not useful, then negotiating the implementation is unnecessary.

There is a separate question of whether it might make sense to allow users to trace at an "empty address" -- I wonder if @georgematheos's work on "ValueChoiceMaps" have anything to say about that. As it stands, Gen choicemaps cannot store choices and subtraces at the same address.

FWIW, as of https://github.com/probcomp/Gen.jl/pull/153, trace[] is syntactic sugar for get_retval(trace). I think the connotation is that "empty address means retval," and I think allowing the user to decide to trace something other than the return value at the empty address would be confusing.

alex-lew commented 4 years ago

Or is there a disintegration-based semantics?

Yes, there is a disintegration-based semantics that relies on the fact that when you condition on a choice, that choice is from a primitive distribution with a known density. A version of this is worked out in our "trace types" paper, and Marco and I also worked out a definition for the 'posterior' given a choicemap in a general Gen setting. Gen's requirements about addresses and supports makes this straightforward to define, and justifies Gen's scoring behavior (as well as that in Turing, Anglican, etc.).

certainly adding return values to the choicemap and therefore not having to add arguments to any API methods would be an economical way to implement the thing I'm proposing, if we come to the decision that the conceptual capability is desirable. I was brushing aside that question because it sounds like you don't think the conceptual capability would be desirable even if the cost in terms of implementation and API churn was zero. I'm trying to have that discussion first because if the capability is incoherent or not useful, then negotiating the implementation is unnecessary.

I am taking the capability to be

it is supposed to give you a better-than-prior proposal for when you are trying to do inference (on the choicemap) given the information that the return value of the subroutine was some particular value.

As we've discussed, this is not possible in general: generative functions need not implement "constraint propagation" logic that enables them to implement a proposal distribution that is always consistent with some return value.

If the author of a generative function intends to support the operation of conditioning on the return value, the author may expose a trace address for this value -- in exactly the same way that they expose a trace address for any other value for which they intend to support observation. It is just as much of a burden for the consumer of the generative function to know "does this GF support condition_retval?" as it is to know "what address does this GF expose for me to condition on?"

Now, it seems like you are possibly suggesting something beyond this: you want a way to "give information" (like the return value) to the default proposal without "conditioning on it" (I take your "given" above not to be a probabilistic "given", because we don't have a clear definition for conditioning in this context). The question of whether default proposals should accept (optional) arguments, just like external proposals do, has arisen a number of times (e.g., #25, #189), and I think @marcoct and I both think the answer should be "yes." Those arguments could be used in some cases to "incorporate info about a return value" without conditioning on the return value. But this would be something decided on a generative-function-by-generative-function basis; other GFs would take arguments corresponding to things like "the amount of computation I want the proposal to do", etc.

So:

bzinberg commented 4 years ago

Thanks @alex-lew, this is perfect. I was exactly looking for the information that there is interest in allowing internal proposals to be customizable -- that's an ideal place to put the "known return value" hint. In that context, I now view this issue as a +1 for that line of thinking. I.m.e. as a practitioner, this has become a pretty important ergonomic point, where the "noise/slack" generative code could be nicely refactored if there were a slick syntax for passing custom hints to the internal proposal distribution.

If the author of a generative function intends to support the operation of conditioning on the return value

100% agreed on "if" being the operative word.

It is just as much of a burden for the consumer of the generative function to know "does this GF support condition_retval?" as it is to know "what address does this GF expose for me to condition on?"

Agreed. What I'm not aware of, though, is a way to "expose a trace address" for the return value if the return value is not a random choice. You mean like a Kronecker delta?

  • I am for making it easier to customize the internal proposal of a generative function.

:+1:

  • I am for library authors thinking carefully about what choicemap interface to expose, so that there are "output addresses" that consumers can condition on and will remain stable when the guts of the GFs change.

That's a great way to put it. Speaks to my concern about making it possible to write libraries that don't require the user to know all the implementation details (and I agree that it's context-dependent what counts as "implementation details" -- much more so than in ordinary software).

  • I am for adding side-channels by which users can pass arguments to internal proposals, to configure the proposals in ad-hoc ways not directly captured by the 'condition on a choice' semantics.

:+1: :+1: :+1:

  • I am against limiting any of the above mechanisms to serve the sole purpose of "incorporating information about a return value," which seems ill-defined to me.

Sounds good. I'd like to get more familiar with the disintegration semantics that formalize Gen's notion of conditioning, so that I can be less whimsical about claiming things are well-defined :slightly_smiling_face:

georgematheos commented 4 years ago

As a quick response to the question of being allowed to constrain an "empty address"--I think there are times this would be syntactically nice. It is simplest to see how this would work in the case that a generative function's return value is drawn from a distribution: then constraining the "empty address" is just constrains the sample from this distribution. I don't currently view this as a super important addition, but it might make some things easier, eg. switching between using a distribution and generative function in the model without changing the inference program.