Closed ricardoV94 closed 2 years ago
This conversation is about v4
, no?
Any new functionalities will come only to V4 at this point. So yes.
OK, so this touches upon a few bigger questions/ideas, and underlying most of them is the question "Do we need the Model
object and/or its constraints in order to do X, Y, and/or Z?", so I'll start with that.
In this example, "...X, Y, and/or Z" is posterior predictive sampling, which really only requires a RandomVariable
-containing graph, since it's basically just aesara.function
-compiling said graphs. In this case, if you focus on just the Aesara graphs, things like the name restrictions imposed by Model
are irrelevant: i.e. one could completely duplicate/recreate the underlying RandomVariable
-containing graph and sample away.
Also, we need to start asking ourselves if we really need constraints like unique names. From the Aesara perspective, we don't, and we definitely shouldn't use variable names as a means of identifying Aesara variables (the same is/was true for Theano).
Also, we need to start asking ourselves if we really need constraints like unique names. From the Aesara perspective, we don't, and we definitely shouldn't use variable names as a means of identifying Aesara variables (the same is/was true for Theano).
As things are structured now, non unique variable names would break arviz conversion (and so would variables and dimensions having the same name). I am not sure what would be the gain of non-unique variable names but if we wanted to go down that path we'd have to update the converter to take that into account and edit the names so they stop being unique.
non unique variable names would break arviz conversion
I imagine we could "unique-ify" the names before passing anything off to Arviz.
N.B. Aesara already has an auto-unique-naming feature that we could use/extend:
import aesara.tensor as at
a_1 = at.vector("a")
a_2 = at.vector("a")
>>> a_1.name == "a"
True
>>> a_2.name == "a"
True
>>> a_1.auto_name
'auto_3'
>>> a_2.auto_name
'auto_4'
Another thing worth noting: pymc3.model.Model
is already very close to aesara.graph.fg.FunctionGraph
, in both concept and functionality. Both are containers that "encapsulate" graphs and carry/provide information about said graphs.
More specifically, FunctionGraph
is all about specifying inputs and outputs for graph-based calculations. It provides methods for manipulating the encapsulated graphs in a consistent way (i.e. in a way that updates other dependent quantities as well), and is used by another process to perform computations.
Model
is exactly the same: it's inputs are the un-observed random variables, and the outputs are the observed ones. Model
s are likewise used by another process to perform computations: i.e. posterior sampling.
One of the most notable differences is that Model
lacks the manipulation features that FunctionGraph
provides (e.g. ability to add and remove inputs, replace arbitrary parts of the graph, etc.)
Depending on how well this analogy/correspondence works (in a functional sense), we might want to consider basing Model
off of FunctionGraph
.
That's quite interesting.
I don't fully grasp the extent of aesara graph abilities but it would really be nice to be able to operate more organically on PyMC3 models / graphs as an "object" in itself and not as a "context"
On the other hand there is always going to be a divide before and after sampling that must stay clear. For instance posterior predictive samples cannot affect posterior sampled values.
This issue was trying to get at this, although probably in a very v3 mindset. How can we facilitate extending models after posterior sampling in a way that is both intuitive but also safe (ie not creating the illusion that you can do incremental inferences)?
it would really be nice to be able to operate more organically on PyMC3 models / graphs as an "object" in itself and not as a "context"
That's exactly what I'm getting at. A distinguishing characteristic of FeatureGraph
s (vs. Model
s) is that they take graphs as inputs. This—I think—is an important part of what's missing from the Model
context approach.
It makes more sense for a Model
object to work in exactly the same way as FunctionGraph
, because the Aesara graph of a model exists independently from a Model
object. Also, one can create an Aesara graph from which multiple Model
s could be derived.
For example, consider the following graphs:
import aesara.tensor as at
import aesara.tensor.random.basic as ar
A_rv = ar.normal(0, 1)
B_rv = ar.gamma(0.5, 0.5)
Y_rv = ar.normal(A_rv, 1.0 / B_rv)
Z_rv = ar.poisson(at.exp(Y_rv))
Using our hypothetical FunctionGraph
-like Model
type, we could define the following "sample-able" models:
# Model(unobserved: List[Variable], observed: List[Variable], givens: Dict[Variable, np.ndarray])
m1 = Model([A_rv, B_rv], [Y_rv])
m2 = Model([A_rv, B_rv, Y_rv], [Z_rv])
m3 = Model([A_rv, B_rv], [Y_rv, Z_rv])
m4 = Model([A_rv], [Y_rv, Z_rv], givens={B_rv: 1.2})
# etc.
Just a quick clarification, when you say observed in that example do you mean the output of random ancestral sampling or of mcmc following bayesian conditioning (ie the observed argument in PyMC3)? It looks like the first but just want to be sure.
More generally how well can these models broadcast automatically, so that the givens could the entire posterior trace and everything not in the givens (added before or after) would be obtained by ancestral sampling based on these values?
Edit: I think these 2 discourse threads may be useful for this discussion:
https://discourse.pymc.io/t/bug-in-fast-sample-posterior-predictive/6904
https://discourse.pymc.io/t/sample-posterior-predictive-with-a-vengeance/5926
Just a quick clarification, when you say observed in that example do you mean the output of random ancestral sampling or of mcmc following bayesian conditioning (ie the observed argument in PyMC3)? It looks like the first but just want to be sure.
The observed
arguments are the variables that would belong in Model.observed_RVs
, and the unobserved
arguments are, naturally, the ones that would belong in Model.unobserved_RVs
.
More generally how well can these models broadcast automatically, so that the givens could the entire posterior trace and everything not in the givens (added before or after) would be obtained by ancestral sampling based on these values?
The answer to that question always depends on how well the Op
s involved are implemented, but for all the RandomVariable
s I've implemented so far, they should be completely broadcast-able. In other words, yes, one should be able to provide this hypothetical givens
argument an entire trace of data and everything should work out correctly.
I think that while I understand what you mean for the case of ancestral sampling, I am not sure how it would look like for MCMC sampling.
To help ground the discussion, here is the original example that @canyon289 posted on the slack discussion.
@michaelosthege you mentioned you do a lot of this with GPs, do you have a minimal example?
What would be a good API to achieve this goal? Trying to reason with what @brandonwillard said above, would something like this make sense?
idxs = np.array([0, 0, 1, 1, .., 9, 9])
mu_group = pm.Normal(0, 1)
sigma_group = pm.HalfNormal(1)
mu_individual = pm.Normal(mu_group, sigma_group, size=10)
observation = pm.Normal(mu_individual[idxs], 1)
model = Model([mu_group, sigma_group, mu_individual], [observation])
prior = model.prior_predictive() # returns ancestral samples for [observation, mu_group, sigma_group, mu_individual]
trace = model.sample(givens={observation: data}) # returns mcmc samples for [mu_group, sigma_group, mu_individual]
ppc = model.posterior_predictive(givens=trace) # returns ancestral samples for [observation]
new_mu_individual = pm.Normal(mu_group, sigma_group)
extended_model = Model([new_mu_individual], [mu_group_sigma, sigma_group])
extended_model.sample_posterior_predictive(givens=trace)
We should never need to recreate (partially or otherwise) a model in order to perform out-of-sample (OOS) sampling. In every case I've seen, the real problem with performing OOS sampling using PyMC3 models is the lack of flexible shape support, but this is exactly what we're addressing in v4
(and a big part of the reason why we're addressing it).
@michaelosthege you mentioned you do a lot of this with GPs, do you have a minimal example?
@ricardoV94 my code reads pretty much like https://docs.pymc.io/notebooks/GP-Latent.html#Using-.conditional
Note that the with model
reuses the model from above - after sampling it.
@michaelosthege, would it have been possible to make the variable(s) underlying X_new
into shared variable(s) so that they could simply be updated and the original model object could be reused?
@michaelosthege, would it have been possible to make the variable(s) underlying
X_new
into shared variable(s) so that they could simply be updated and the original model object could be reused?
Only if changing the shape works, because you might want to evaluate the GP at some 1_000 coordinates, but don't want them in the trace already (memory, performance, matrix inversion errors).
This idea emerged in a discussion by @canyon289 on how to do posterior predictive sampling on new random vars.
The current solution is to add a new variable after sampling as in:
The problem with that approach as mentioned by @michaelosthege is that you only have one shot of getting it right as there is no API to delete a variable from a model. So if you decide to try something else you have to extend with yet another variable and a new name.
A nested model does not work because the new variables are added to the original model (and it is untested territory)
It would be great to have something that extends an existing model without affecting it:
Could also be the nested syntax as long as we ensured the original model is not affected.