Sampling from the model without defining model_sample()

dustinvtran commented 7 years ago

This might be more of a user-related question, but I've noticed that in the current examples, in order to sample from the model or expose some internal execution in the program (such as a VAE's hidden representation before the likelihood draw), you explicitly define a model_sample() function. Rewriting many pieces of the model is undesirable for very large neural networks and probabilistic models. Is it possible to directly sample from the model() program?

karalets commented 7 years ago

Great comment! One way to do this is to take the definition of the model and create a function def model_flow() and then add an observe or a sample in separate function to make it more modular. We also have the ability to write models as objects, which should make this more straightforward by having versions model.scored() or model.sampled(). There are also prototypes of some routines which could make sampling form a model more natural, but we have not used those yet. \However, the examples you mention were written by me a while ago (before the summer) and we have not tried to streamline aspects such as the one you mentioned quite yet.

On Tue, Sep 26, 2017 at 9:18 AM, Dustin Tran notifications@github.com wrote:

This might be more of a user-related question, but I've noticed that in the current examples, in order to sample from the model or expose some internal execution in the program (such as a VAE's hidden representation before the likelihood draw), you explicitly define a model_sample() function. Rewriting many pieces of the model is undesirable for very large neural networks and probabilistic models. Is it possible to directly sample from the model() program?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/154, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhL4MVk00gv8a94C0uEHooPUHGD_kDks5smSPMgaJpZM4Pkg2h .

dustinvtran commented 7 years ago

One way to do this is to take the definition of the model and create a function def model_flow() and then add an observe or a sample in separate function to make it more modular.

This seems hacky for a modeler in that I might not know what I'd like to visualize a priori (either prior to training or, more likely, prior to building my huge experiment code base where each time I want something new to visualize I have to refactor the model). Worst case scenario, I write a new function for every line in the program which quickly gets ugly.

We also have the ability to write models as objects, which should make this more straightforward by having versions model.scored() or model.sampled(). There are also prototypes of some routines which could make sampling form a model more natural, but we have not used those yet.

Sounds interesting! Would love to see that at some point.

karalets commented 7 years ago

Well, they are not intended for that specifically, but we thought about ideas like sample as observe and so on which can have multiple roles depending on the context they appear in. I think this is certainly a point to discuss with the group.

On Tue, Sep 26, 2017 at 10:09 AM, Dustin Tran notifications@github.com wrote:

One way to do this is to take the definition of the model and create a function def model_flow() and then add an observe or a sample in separate function to make it more modular.

This seems hacky for a modeler in that I might not know what I'd like to visualize a priori (either prior to training or, more likely, prior to building my huge experiment code base where each time I want something new to visualize I have to refactor the model). Worst case scenario, I write a new function for every line in the program which quickly gets ugly.

We also have the ability to write models as objects, which should make this more straightforward by having versions model.scored() or model.sampled(). There are also prototypes of some routines which could make sampling form a model more natural, but we have not used those yet.

Sounds interesting! Would love to see that at some point.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/uber/pyro/issues/154#issuecomment-332268075, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVhLz6tp55oRhsjEWmdBhyUyixgmmUpks5smS-DgaJpZM4Pkg2h .

eb8680 commented 7 years ago

This is definitely a point of awkwardness in the modelling language. To expose latent variables other than a stochastic function's return value, one can trace the execution of a stochastic function with pyro.poutine.trace. This basically returns a dict where the keys are the names at the Pyro primitive sites and the values are other dicts that contain information about the site.

Traces are mostly used by inference algorithms at the moment, but the functionality is very straightforward and there's no reason a user couldn't use it in their experiment code. This way it's also possible to get information about many latent variables in a model from one execution.

ngoodman commented 7 years ago

this pattern does indeed come up often, and should probably be made more pleasant. i'm not sure what the most general pattern / usage is, so i'd suggest a helper and standard idiom rather than language feature.

for instance, in models that have a single map_data (indicating iid observations) a natural ask is to examine the posterior predictive of the next data point. so if we have a learning model like:

def model(args):
  global = pyro.sample(stuff1...)
  observe = lambda d: ...stuff2; pyro.observe( foo, d)
  map_data(data, observe)

we'd like to be able to do something like predictive(model) to get something equivalent to:

def predictive(args):
  global = pyro.sample(stuff...)
  ...stuff2 
  return pyro.sample( foo)

(we can use poutines to expose intermediate RVs as return values using the idea of the do and inspect operators (can't find issue atm), so this is a pretty general operation.)

this is restricted to the single map_data and single observe per datum case. is it clear to anyone what the most general version is?

martinjankowiak commented 6 years ago

@eb8680 can we close this? do we need to implement more functionality along the lines of inspect? or do we need more examples of the kind of modeling flow raised in this issue in the tutorials?

eb8680 commented 6 years ago

@martinjankowiak I'll close it. The solution to this is to use pyro.condition instead of hardcoding observe within models and to use the site kwarg of pyro.infer.Marginal. The two language intro tutorials contain examples of these idioms but they could certainly be documented better and featured more prominently.

pyro-ppl / pyro

Sampling from the model without defining model_sample() #154