Create an API to extend an existing model without affecting old one

ricardoV94 commented 3 years ago

This idea emerged in a discussion by @canyon289 on how to do posterior predictive sampling on new random vars.

The current solution is to add a new variable after sampling as in:

with pm.Model() as m:
  x = pm.Normal('x')
  y = pm.Normal('y', x, 1, observed=data)
  trace = pm.sample()

with m:
  new_y = pm.Gamma('new_y', mu=x, sigma=2)
  ppc = pm.sample_posterior_predictive(trace, var_names=['new_y']

The problem with that approach as mentioned by @michaelosthege is that you only have one shot of getting it right as there is no API to delete a variable from a model. So if you decide to try something else you have to extend with yet another variable and a new name.

A nested model does not work because the new variables are added to the original model (and it is untested territory)

with pm.Model() as m:
  x = pm.Normal('x')
  y = pm.Normal('y', x, 1, observed=data)
  trace = pm.sample()

with m:
  with pm.Model() as extended_m:
    new_y = pm.Gamma('new_y', mu=x, sigma=2)
    ppc = pm.sample_posterior_predictive(trace, var_names=['new_y']

It would be great to have something that extends an existing model without affecting it:

with pm.Model() as m:
  x = pm.Normal('x')
  y = pm.Normal('y', x, 1, observed=data)
  trace = pm.sample()

with pm.Extend(m) as extended_m:
    new_y = pm.Gamma('new_y', mu=x, sigma=2)
    ppc = pm.sample_posterior_predictive(trace, var_names=['new_y']

Could also be the nested syntax as long as we ensured the original model is not affected.

brandonwillard commented 3 years ago

This conversation is about v4, no?

ricardoV94 commented 3 years ago

Any new functionalities will come only to V4 at this point. So yes.

brandonwillard commented 3 years ago

OK, so this touches upon a few bigger questions/ideas, and underlying most of them is the question "Do we need the Model object and/or its constraints in order to do X, Y, and/or Z?", so I'll start with that.

In this example, "...X, Y, and/or Z" is posterior predictive sampling, which really only requires a RandomVariable-containing graph, since it's basically just aesara.function-compiling said graphs. In this case, if you focus on just the Aesara graphs, things like the name restrictions imposed by Model are irrelevant: i.e. one could completely duplicate/recreate the underlying RandomVariable-containing graph and sample away.

Also, we need to start asking ourselves if we really need constraints like unique names. From the Aesara perspective, we don't, and we definitely shouldn't use variable names as a means of identifying Aesara variables (the same is/was true for Theano).

OriolAbril commented 3 years ago

Also, we need to start asking ourselves if we really need constraints like unique names. From the Aesara perspective, we don't, and we definitely shouldn't use variable names as a means of identifying Aesara variables (the same is/was true for Theano).

As things are structured now, non unique variable names would break arviz conversion (and so would variables and dimensions having the same name). I am not sure what would be the gain of non-unique variable names but if we wanted to go down that path we'd have to update the converter to take that into account and edit the names so they stop being unique.

brandonwillard commented 3 years ago

non unique variable names would break arviz conversion

I imagine we could "unique-ify" the names before passing anything off to Arviz.

N.B. Aesara already has an auto-unique-naming feature that we could use/extend:

import aesara.tensor as at

a_1 = at.vector("a")
a_2 = at.vector("a")

>>> a_1.name == "a"
True
>>> a_2.name == "a"
True
>>> a_1.auto_name
'auto_3'
>>> a_2.auto_name
'auto_4'

brandonwillard commented 3 years ago

Another thing worth noting: pymc3.model.Model is already very close to aesara.graph.fg.FunctionGraph, in both concept and functionality. Both are containers that "encapsulate" graphs and carry/provide information about said graphs.

More specifically, FunctionGraph is all about specifying inputs and outputs for graph-based calculations. It provides methods for manipulating the encapsulated graphs in a consistent way (i.e. in a way that updates other dependent quantities as well), and is used by another process to perform computations.

Model is exactly the same: it's inputs are the un-observed random variables, and the outputs are the observed ones. Models are likewise used by another process to perform computations: i.e. posterior sampling.

One of the most notable differences is that Model lacks the manipulation features that FunctionGraph provides (e.g. ability to add and remove inputs, replace arbitrary parts of the graph, etc.)

Depending on how well this analogy/correspondence works (in a functional sense), we might want to consider basing Model off of FunctionGraph.

ricardoV94 commented 3 years ago

That's quite interesting.

I don't fully grasp the extent of aesara graph abilities but it would really be nice to be able to operate more organically on PyMC3 models / graphs as an "object" in itself and not as a "context"

On the other hand there is always going to be a divide before and after sampling that must stay clear. For instance posterior predictive samples cannot affect posterior sampled values.

This issue was trying to get at this, although probably in a very v3 mindset. How can we facilitate extending models after posterior sampling in a way that is both intuitive but also safe (ie not creating the illusion that you can do incremental inferences)?

brandonwillard commented 3 years ago

it would really be nice to be able to operate more organically on PyMC3 models / graphs as an "object" in itself and not as a "context"

That's exactly what I'm getting at. A distinguishing characteristic of FeatureGraphs (vs. Models) is that they take graphs as inputs. This—I think—is an important part of what's missing from the Model context approach.

It makes more sense for a Model object to work in exactly the same way as FunctionGraph, because the Aesara graph of a model exists independently from a Model object. Also, one can create an Aesara graph from which multiple Models could be derived.

For example, consider the following graphs:

import aesara.tensor as at
import aesara.tensor.random.basic as ar

A_rv = ar.normal(0, 1)
B_rv = ar.gamma(0.5, 0.5)

Y_rv = ar.normal(A_rv, 1.0 / B_rv)
Z_rv = ar.poisson(at.exp(Y_rv))

Using our hypothetical FunctionGraph-like Model type, we could define the following "sample-able" models:

# Model(unobserved: List[Variable], observed: List[Variable], givens: Dict[Variable, np.ndarray])
m1 = Model([A_rv, B_rv], [Y_rv])
m2 = Model([A_rv, B_rv, Y_rv], [Z_rv])
m3 = Model([A_rv, B_rv], [Y_rv, Z_rv])
m4 = Model([A_rv], [Y_rv, Z_rv], givens={B_rv: 1.2})
# etc.

ricardoV94 commented 3 years ago

Just a quick clarification, when you say observed in that example do you mean the output of random ancestral sampling or of mcmc following bayesian conditioning (ie the observed argument in PyMC3)? It looks like the first but just want to be sure.

More generally how well can these models broadcast automatically, so that the givens could the entire posterior trace and everything not in the givens (added before or after) would be obtained by ancestral sampling based on these values?

Edit: I think these 2 discourse threads may be useful for this discussion:

https://discourse.pymc.io/t/bug-in-fast-sample-posterior-predictive/6904

https://discourse.pymc.io/t/sample-posterior-predictive-with-a-vengeance/5926

brandonwillard commented 3 years ago

Just a quick clarification, when you say observed in that example do you mean the output of random ancestral sampling or of mcmc following bayesian conditioning (ie the observed argument in PyMC3)? It looks like the first but just want to be sure.

The observed arguments are the variables that would belong in Model.observed_RVs, and the unobserved arguments are, naturally, the ones that would belong in Model.unobserved_RVs.

More generally how well can these models broadcast automatically, so that the givens could the entire posterior trace and everything not in the givens (added before or after) would be obtained by ancestral sampling based on these values?

The answer to that question always depends on how well the Ops involved are implemented, but for all the RandomVariables I've implemented so far, they should be completely broadcast-able. In other words, yes, one should be able to provide this hypothetical givens argument an entire trace of data and everything should work out correctly.

ricardoV94 commented 3 years ago

I think that while I understand what you mean for the case of ancestral sampling, I am not sure how it would look like for MCMC sampling.

To help ground the discussion, here is the original example that @canyon289 posted on the slack discussion.

@michaelosthege you mentioned you do a lot of this with GPs, do you have a minimal example?

What would be a good API to achieve this goal? Trying to reason with what @brandonwillard said above, would something like this make sense?

idxs = np.array([0, 0, 1, 1, .., 9, 9])
mu_group = pm.Normal(0, 1)
sigma_group = pm.HalfNormal(1)
mu_individual = pm.Normal(mu_group, sigma_group, size=10)
observation = pm.Normal(mu_individual[idxs], 1)

model = Model([mu_group, sigma_group, mu_individual], [observation])
prior = model.prior_predictive()  # returns ancestral samples for [observation, mu_group, sigma_group, mu_individual]
trace = model.sample(givens={observation: data})  # returns mcmc samples for [mu_group, sigma_group, mu_individual]
ppc = model.posterior_predictive(givens=trace)   # returns ancestral samples for [observation]

new_mu_individual = pm.Normal(mu_group, sigma_group)
extended_model = Model([new_mu_individual], [mu_group_sigma, sigma_group])
extended_model.sample_posterior_predictive(givens=trace)

brandonwillard commented 3 years ago

We should never need to recreate (partially or otherwise) a model in order to perform out-of-sample (OOS) sampling. In every case I've seen, the real problem with performing OOS sampling using PyMC3 models is the lack of flexible shape support, but this is exactly what we're addressing in v4 (and a big part of the reason why we're addressing it).

michaelosthege commented 3 years ago

@michaelosthege you mentioned you do a lot of this with GPs, do you have a minimal example?

@ricardoV94 my code reads pretty much like https://docs.pymc.io/notebooks/GP-Latent.html#Using-.conditional Note that the with model reuses the model from above - after sampling it.

brandonwillard commented 3 years ago

@michaelosthege, would it have been possible to make the variable(s) underlying X_new into shared variable(s) so that they could simply be updated and the original model object could be reused?

michaelosthege commented 3 years ago

@michaelosthege, would it have been possible to make the variable(s) underlying X_new into shared variable(s) so that they could simply be updated and the original model object could be reused?

Only if changing the shape works, because you might want to evaluate the GP at some 1_000 coordinates, but don't want them in the trace already (memory, performance, matrix inversion errors).

pymc-devs / pymc

Create an API to extend an existing model without affecting old one #4529