Closed srush closed 6 years ago
We're definitely planning to add a dynamic scoping mechanism of some sort for both param
and sample
, it just didn't make it into this release. See #73 for some discussion about this.
Sorry, but I don't really get this. The plan seems to be to copy TensorFlow's heavyweight design. PyTorch is so nice because it piggybacks on python's natural scoping rules.
Clarification: It would seems natural to me that params would behave like any other object in PyTorch (variables, params, etc.). You don't access it by a global string in some global state dictionary, but by actually having access to the variable. That way you don't break the standard coding abstractions in python. This also allows you to easily introduce anonymous terms and parameters and avoids ugly string manipulation for lookups.
For models that can be written as nn.Module
s, where all parameters can be created before function evaluation, pyro.module
will add the correct scope to the parameter names automatically.
Unfortunately, parameters in Pyro don't always work this way; consider e.g. per-datapoint variational parameters that are created on the fly each time a new data point appears, or parameters whose existence depends on a random variable.
sample
statements are also problematic because names are used to identify corresponding random variables in models and guides, which may have arbitrarily different control structure. Lexical scoping for sample names would break this identification.
Not sure I understand.
Why should you be able to sample a random variable that you don't have pointer access to? Needing to do that sounds dangerous.
Why does having non-global scoping prevent per-data point parameters? Presumably whatever object is creating the parameters should own them, not some global store.
How come these issues don't come up in pytorch? In theory you have to do many of the same things in standard deep learning, and yet there are no globals.
(Sorry, this library is great, and I was thinking of using it for teaching. But I am really worried about this design decision.)
(Sorry, this library is great, and I was thinking of using it for teaching. But I am really worried about this design decision.)
No problem! The parameter store is definitely a weak point, and this is great feedback.
Why does having non-global scoping prevent per-data point parameters? Presumably whatever object is creating the parameters should own them, not some global store.
nn.Module
s expose references to all of the parameters of their child modules, making it possible to completely separate the definition of losses that only interact with output and optimizers that only interact with parameters and gradients. Since Pyro models can be regular Python functions that call other regular Python functions, you can't in general automatically track parameters through the call stack in the same way, so it seems to me that you end up with something that looks like a parameter store.
I guess we could have a per-model parameter store instead, though, sort of like a dynamically generated nn.Module
wrapper for a function with param
calls. That would probably be much cleaner.
I guess I get that, but now I'm much more worried about this type of name matching though. This feels very scary:
""To be very explicit, if the model contains a random variable z_1
def model(): pyro.sample("z_1", ...) then the guide needs to have a matching sample statement
def guide(): pyro.sample("z_1", ...) The distributions used in the two cases can be different, but the names must line-up 1-to-1."""
If model was a class with member z_1 = sample. Then this could be done correctly instead of asking the user to manage this alignment.
I guess I get that, but now I'm much more worried about this type of name matching though. This feels very scary though.
To compute the Monte Carlo Elbo log(P_model(observations, latents) - log(P_guide(latents))
where latents ~ P_guide
, you need to be able to identify sample statements in the model with sample statements in the guide so that you can compute individual terms log(P_model(latent_i | ...)) - log(P_guide(latent_i| ...))
.
If model was a class with member z_1 = sample. Then this could be done correctly instead of asking the user to manage this alignment
In Pyro, we made the choice to completely separate the definitions of model and guide and allow both of them to be callables with arbitrarily different structure and control flow. The user pays for this flexibility by having to manage things themselves, but could implement the safer functionality you describe on top of the more general behavior we provide.
Okay, sure. I get it, I just disagree.
I would love to have a PyTorch style mode that looks like this:
class Model:
def __init__(self):
self.z_1 = pyro.sample() # no string global name, can't be accessed without pointer
def forward(self):
....
def Guide:
def __init__(self):
self.q_1 = pyro.sample()
def forward(self):
...
m = Model()
g = Guide()
SVI2(variational= [[m.z_1, g.q_1]...])
This seems clean to me, has no globals, and could even allow you to reuse guide functions that with differently named samples.
This seems clean to me, has no globals, and could even allow you to reuse guide functions that with differently named samples.
This is actually the design in Edward (e.g., bayesian_linear_regression.py
). This is difficult for dynamic graphs where it's harder to specify the alignment a priori as the sets of random variables can change during runtime.
String-matching is a sensible solution for Pyro. A similar manual alignment could, for example, assume identical class members in Model
and Guide
. More generally, the alignment could be an arbitrary callable passed into SVI
that creates the dictionary/list thing. (Although clunky, I actually think this lattermost design makes sense for puritanical reasons in that the alignment is a feature during inference and not a feature during model/guide specification.)
Could one of you give me an actual example that can't be handled with real variable scoping? Just asserting that this is reasonable is not convincing. It seems to me like PL 101 that a programming language should avoid having the user reinvent scoping using all global string lookups.
You say that this is a problem with dynamic functions. But PyTorch is a dynamic graph library that never requires the user to do this kind of black magic.
An example of what we're discussing is air.ipynb
. FYI I think it's very possible to solve with non-global scoping. Just not sure what the precise solution is and/or how practical it may be.
Manual alignment of different names in model and guide is a separate issue from scoping. It's actually possible now to do manual alignment in Pyro; that functionality is already implemented in ReplayPoutine
, and could easily be exposed to SVI.
I guess I don't really understand what you're proposing. Scoping for samples is about names, not access; sample statement inputs and outputs are lexically scoped just like function applications.
Consider the following rather contrived example:
def chain(x, t):
if t == 0:
return pyro.sample("{}".format(t), dist.normal, torch.zeros(1), torch.ones(1))
else:
return chain(pyro.sample("{}".format(t), dist.normal, x, torch.ones(1)), t-1)
def model(x, t):
mu = chain(x, t)
return pyro.sample("y", dist.normal, mu, torch.ones(1))
def mean_field_guide(x, t):
latents = []
for i in range(t):
latents.append(pyro.sample("{}".format(t), dist.normal, pyro.param("m_{}".format(t)), torch.ones(1)))
return latents
How should the sample names be scoped automatically without changing the model code? How should an inference algorithm align the guide samples with the model samples?
@dustinvtran Not sure I understand. I'm arguing that global scoping is not necessary.
@eb8680 I see these issues as very connected. The fact that string alignment is done this way is a warning sign symptom of a programming language that is using a problematic internal scoping model.
I made my proposal above. If you want to expose something to an inference algorithm, you should have to have a reference to it, not a string in a global namespace.
Innumerable programming languages thought it was a good idea to have global string references (early php, javascript, etc.) but that always turns out to be a bad idea, and we think of them bad languages to work with. It seems early enough that pyro can avoid this issue.
@srush I updated my example to make my point clearer. I really do appreciate your feedback; it's very helpful for thinking through these issues.
If you want to expose something to an inference algorithm, you should have to have a reference to it, not a string in a global namespace.
The sample namespace isn't global, it's local to the model. It seems to me that your proposal just moves the awkwardness from naming into alignment, since that can no longer be done automatically when the model and guide have different scope structures.
It seems effectively like a global to me. You access it by constructing its global string name, not by having a reference to it. Compare this to the way PyTorch params respect the natural scoping rules of python. You cannot magically access them, you need a reference. This is how I expect non-global variables to work.
I think I understand your example. How about this?
class Model:
def __init__(self):
self.mus = pyro.LatentList(dist.normal) # no string global name, can't be accessed without pointer
self.x = pyro.Latent(dist.normal)
def forward(self):
mu = self.mus.next_sample(torch.zeros(1), torch.ones(1))
for i in range(t):
mu = self.mus.next_sample(mu, torch.ones(1))
return self.x.sample(mu, torch.ones(1))
def Guide:
def __init__(self):
self.q = pyro.LatentList(dist.normal) # no string global name, can't be accessed without pointer
def forward(self, t):
return [self.q.next_sample(pyro.param()) for i in range(t)]
m = Model()
g = Guide()
SVI2(variational= [[m.mus, g.q]...])
Or alternatively you could have a pyro.Plate
for when the samples are exchangeable.
we've talked about doing things of this sort (LatentList
etc.) but more so as an additional construct for helping the user deal with names than as a total replacement. my concern would be that some use cases won't be adequately covered using some set of built-in constructs like LatentList
and we'd end up with an ever expanding zoo of such constructs. for example, one of our inference tests uses a conjugate gaussian model with the structure of a N-layered pyramid. see this figure for N=3:
https://gist.github.com/martinjankowiak/af061c7a00e86e2874a217a0853efb2c
on the left is the model and on the right is (one possible factorization) of the guide. the way in which i need to sample things in the guide is quite different from the natural ordering in the model. constructing the guide and aligning it with the model using some generalization of LatentList
might be a bit unwieldy and awkward. it might also be tricky to write multiple guides with different factorizations for the same model. of course, having the user provide explicit names can be tedious too, but at least he or she has the flexibility to support all possible model/guide structures/alignments.
we all agree that we need to think through how naming works more carefully, and this is certainly a good time to be doing it. i think it's definitely worth taking a look at various constructs like LatentList
. i'd want to see that they can provide all the flexibility we want before making any breaking changes to how names work in pyro. in any case thanks for your suggestion: these are exactly the kinds of conversations we were looking forward to having once we opened up the repo : )
Thanks Martin. That's helpful to know.
Let me reiterate my point once more though, because I feel it is being lost. My argument is not for a particular style of handling locals (I made up this LatentList thing on my phone), but it is against globals as a default option that is the "documented way to do it". If there are corner case models that really require string matching based alignment, then have a LatentDict object that is owned by that part of the model and can use whatever strings it wants. This gives the ability to "support all possible model/guide structures/alignments", sicne you can then do something similar in the Guide. This has all the functionality you want but does not requiring, what to me, feels like late 90's PHP query string construction.
I think we all agree it would help to have more concrete examples 😃 To help explore syntax, I hacked together a pyro.anon module on an anonymous-sites
branch (never intended to be merged). This has a single example examples/anonymous.py that actually runs in Pyro. @srush does this look any closer to your ideal syntax? If so, would you be able to provide a more complex example? If not and I've missed your point, perhaps we should all meet over video conference to help clarify.
Yeah! This is perfect, exactly what I imagined. Maybe can be bit less verbose, but it checks all my boxes. Also it is heighrarchical, so you can easily nest groupings of latent variables without string building. (It's kind of a statemonad in Python).
My one quibble is that it still matches the guide variables by name match, but otherwise way better.
@fritzo that's pretty neat. It might make sense to implement a LatentObject
layer on top of nn.Module
, so that those sorts of models can inherit Module
parameter handling directly.
@srush we'll definitely consider including something like that as we iterate on naming and scoping, though I see it existing alongside with rather than replacing the current syntax, much like torch.nn
exists alongside the raw autograd
API.
So just imagining a bit how you could make this even more friendly.
1) Maybe this is too much, but you could have a @pyro decorator that wraps this class thing, and returns the object with all the contained variables. That way it makes it easy to build up models as little blocks. 2) Since these functions are lazy. I am imagining that the explicit alignment in SVI could be done by passing in a lambda. 3) Since models are stacked. You can pretty print all the variables based on how they are grouped in the code.
@pyro
def TimeSeries(model, obvs):
model.sigma = anon.param(Variable(torch.ones(1), requires_grad=True))
model.steps = []
for obv in obvs:
prev = models.step[-1] if model.steps else Variable(torch.zeros(1))
model.steps.append(TimeStep()(obv, models.steps[-1], model.sigma))
@pyro
def TimeStep(model, x, prev, sigma):
model.y = anon.sample(dist.normal, prev, sigma))
model.x = anon.observe(dist.normal, x, prev.y, sigma))
@pyro
def Guide(model, obvs):
model.mus = [anon.param(Variable(torch.zeros(1), requires_grad=True)) for _ in range(len(xs))]
model.sigmas = [anon.param(Variable(torch.ones(1), requires_grad=True)) for _ in range(len(xs))]
model.ys = [anon.sample(dist.normal, model.mus[i], model.sigmas[i]) for i in range(len(xs))]
def main(args):
model = Model()
guide = Guide()
optim = Adam({"lr": 0.01})
inference = SVI(model, guide, optim, loss="ELBO",
align=lambda m, g: return [([s.y for s in m.steps], g.ys) ])
data = Variable(torch.Tensor([0, 1, 1, 0, 1, 2]))
for step in range(args.num_epochs):
if step % 100 == 0:
loss = inference.step(data)
print('{}\t{:0.5g}'.format(step, loss))
print('Parameters:')
# Will automatically collect all the sub variables and
for name in model.get_all_variable_names():
print(name)
haha @eb8680 I think you have to make a choice :) torch.nn is not a great analogy, as it is entirely built on top of torch.autograd. Building a nice well scoped library using anonymous names in a global store with scary functions like clear_param_store()
will be dangerous.
Since these functions are lazy. I am imagining that the explicit alignment in SVI could be done by passing in a lambda.
I agree. To reiterate my last comment:
More generally, the alignment could be an arbitrary callable passed into SVI that creates the dictionary/list thing. (Although clunky, I actually think this lattermost design makes sense for puritanical reasons in that the alignment is a feature during inference and not a feature during model/guide specification.)
Oh yes @dustinvtran I agree. I was mostly responding to a change to @fritzo 's sample code. My main mission here is to kill the global variable store. I see these string alignment sub-issues as symptoms of the global variable store.
@srush I like your decorator idea. I've updated examples/anonymous.py to use an @anon.function
decorator similar to your @pyro
. The resulting modeling language looks good but has lots of sharp edges.
Nice! This is looking great, let me play around a bit and try some of the examples.
Curious does it chain? Could we call a subfunction and assign it's returned latent to a member?
I think I might also be in favor of making the user call LatentList, LatentDict explicitly (kind of like ModuleList in pytorch).
Oh wait, I see @fritzo . So the LatentList
thing is only there to ensure that all of the anon's have names that match their variables? So it automatically calls them "x[0]" "x[1]" etc.?
What's the argument for why that is necessary? Now that you have actual objects, why not just give them a random address?
pyro.sample(uuid.uuid4(), ...)
Am I missing something?
Could we call a subfunction and assign it's returned latent to a member?
Hmm assignment from a return value doesn't generally work yet (since the returned latent must be attached to the top-level model before* assignment). But it does work to let subfunctions mutate parts of the latent:
@anon.function # Decorator is only used for top-level models.
def model(latent, size):
latent.params.mu = Variable(torch.zeros(1))
latent.params.sigma = anon.param(Variable(torch.ones(1))
latent.samples = []
my_subfunction(latent.params, latent.samples, size)
def my_subfunction(params, samples, size):
for i in range(size):
samples.append(anon.sample(dist.normal, params.mu, params.sigma))
Am I missing something?
One crucial behavior you might be unaware of is that pyro.sample()
statements execute differently depending on their name. In the model
, we need to know the name of a sample statement before executing it, so that if it was sampled as part of the guide we can use that value instead.
oh! I see, that makes a lot of sense. That also explains a lot of the decisions I didn't understand.
yep, refer to the elbo implementation to see how the 'alignment' occurs under the hood. in _get_traces()
we get the trace of both the model and the guide, and then we use the value sampled from the guide (in the guide trace) to calculate the ELBO. this matching happens via unique sample names
Hmm, so given that this swap happens through some cps/poutine trick that is invisible to the user
and I assume happening at a global level, maybe in the anon version latent
should also be the object that controls sampling?
latent.y = latent.sample(dist.normal, prev, sigma)
That way the user can grok that latent
may have different semantics depending on its caller, versus anon.sample which looks harmless.
(Or alternatively through a cxt
variable also passed in, but maybe that is too much)
Here's a similar notation that works: examples/attribute.py. Feels very protobuffy.
I like it.
The object setting is a little too weird for me though :) Maybe go even more protobuffy?
latent.mus = LatentList() #
or
latent.mus.list() #
....
latent.mus.add().param(Variable(torch.zeros(1), requires_grad=True))
I think dict could work out of the box as well.
latent.qs = LatentDict()
or
latent.mus.dict() #
...
latent.qs["new"].param()
And actually at this point you could remove the
@anon.function
and just have an anon.SVI that calls model(Latent(), ...)
@fritzo Okay, I went through and ported all the documentation to use anon. (I mostly did the intro and svi. a lot of the other documentation is not runnable. )
https://github.com/harvardnlp/pyro/tree/anonymous-sites/tutorial_ette/source
Mostly it worked great. Here are the remaining issues I saw:
Instead of using the decorator, I just used latent
as the first arg. To get this to work, I made a lift
wrapper for passing into SVI etc. Think this is less magic for now, although maybe these functions could just know to do this.
The inference objects (SVI) don't return or carry the current latent state or parameters. The only way I could figure out how to get out the parameters afterwards was by using a global call to param. Seems like you should be able to get to the current guide parameters through SVI object.
alpha_q = torch.exp(pyro.param(".log_alpha_q")).data.numpy()[0]
Similarly .marginal
seems to return a dictionary. But it should instead return a (maybe read only) Latent object.
Yeah, the [object() for _ ]
notation is tough. Think that's an easy fix.
The biggest issue I saw was with the condition
function. That function is cool, and useful. But it needs to give the observations as a dict with strings....
conditioned_scale = pyro.condition(
scale, data={"measurement": Variable(torch.Tensor([8.5])), "xs_1": Variable(torch.Tensor([1]))})
I think the anon version should have these string names be not globals, but localized to the function (scale). Also we should be able to pass in arrays and reference nested modules "submodel.mu".
Maybe for complicated data, there should be a data_fn=obs
where obs is a function that takes latent and fills in variables it wants.
There is a similar issue in the sites
argument of some of the inference algorithms. Not sure how to get around a string argument there. Maybe something like: sites=lambda latent: [latent.weights]
There seem to be global params in pyro.iarange("data_iarange")
but honestly I found the documentation unreadable and gave up. random_module
also seems to create a variable name and could maybe be included in latent. That function seems incredible though, I gave up trying to understand.
Okay, fixed most of these issues in my branch:
https://github.com/harvardnlp/pyro/blob/anonymous-sites/pyro/anon.py
https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/intro_part_i.ipynb https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/intro_part_ii.ipynb https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/svi_part_i.ipynb
There is some more to do, but I think this already way cleaner, maintains scope, makes the semantics cleaner, supports real arrays and nesting, and uses less code.
Trying to find a compromise that we could merge soon: #540
@srush thanks for your ongoing feedback. This issue keeps growing in scope, and it would be really helpful if you could open new issues about things like inadequate documentation for iarange
and random_module
.
As I said, you've made some great points about the parameter store, whose design was largely borrowed from webPPL, and changes like the ones you've sketched seem like clear wins. We're also more than willing to include an anon
-like module ( #540 ), provided it's properly fleshed out.
However, let me try to give some more background on the way sample
names work to explain why a similar solution is not an unambiguous improvement there:
The biggest issue I saw was with the condition function. That function is cool, and useful. But it needs to give the observations as a dict with strings...
There's not really any way around this, and condition
is not just a cool feature that can be ignored or removed; it or something like it is essential for implementing advanced inference algorithms like EP or particle Gibbs with arbitrary guides.
There is a similar issue in the sites argument of some of the inference algorithms. Not sure how to get around a string argument there
You've also identified the same fundamental problem with your model-guide alignment solution: in general, aligning execution traces or observations with an arbitrary Pyro program can be as hard as executing the program, since you need to refer to variables before they exist. As @fritzo said:
In the model, we need to know the name of a sample statement before executing it, so that if it was sampled as part of the guide we can use that value instead.
A corollary to this fact is that in Pyro's current incarnation, sample names don't really correspond conceptually to Python variable names and scopes, and there's no particular reason that they should unless the model and trace being aligned already share the same structure; think of them instead as tags or database queries attached to stochastic function applications (where the function and its return value could both be anonymous Python variables) that solve the problem of alignment by construction.
This design choice is rooted in both practical considerations and historical context. A particularly common workflow in Pyro is to write a single model, then write multiple guides specific to that model with different structure and one or more condition
/do
queries and combine these ingredients with SVI. We don't particularly like string manipulation either, but Pyro's sample names mean that, rather than effectively re-expressing the model's control flow for every guide and condition
query in the form of a new alignment, we only need to write it once. Other naming/addressing schemes, like the one in anon
, can only move complexity around to different places, and we feel that the current approach works well in the use cases we've explored in depth.
Historically, previous universal PPLs like Church, webPPL, Venture, and Anglican handled alignment by automatically generating name strings based on call stack position (which could be done easily in Pyro with Python's advanced runtime introspection facilities, e.g. a memoized version of traceback.extract_stack()
). Guides were handled by threading the guide code through the model code (as in webPPL) or by simply not allowing arbitrary guides. condition
is also a new construct for a universal PPL; previously observations were handled by threading data through models.
Building up names manually through Pyro programs instead of data and guides is one natural generalization of these previous approaches. It cleanly separates the specification of model, data, and inference, and provides a clear point of contact for more advanced techniques (like code transformations, static analysis, alignment specifications like your proposal that completely decouple model and guide names, or the current implementation of anon
(!)) that reduce the burden of manual naming when possible.
As @dustinvtran said, Edward's behavior is closer to what you have in mind. This is precisely because it is possible to refer to nodes in a TensorFlow graph before you know their values, which also limits Edward's expressivity. Indeed the alignment mechanism you added to your anon
example, which uses simultaneous references to model and guide variables with no strings, is only possible if anon
is placed under similar constraints, such that it would effectively be identical to Edward (albeit with lexical scope rather than TensorFlow's dynamic scope).
Ultimately, we only see Pyro as one hypothesis in the space of possible language designs. It may well turn out to be a bad hypothesis - for example, Edward has clearly demonstrated the utility of static graphs as a probabilistic modelling language, and maybe in practice the benefits outweigh relatively mild restrictions on expressivity. That said, our development process was driven by the interaction of our design principles, properties of previous PPLs, and our tests and anchor examples, and we found that what we have strikes a good balance. Larger changes to the syntax and internals of Pyro represent different hypotheses, and should generally be justified on the basis of similar evidence.
Wait, sorry, I'm confused by this response. I think the last anon
solution I posted and wrote examples for captures everything you would like, i.e. keeping around arbitrary condition
statements (with strings) and the alignment of guides, while also providing a syntax that doesn't require the user to reason about a global store or global CPS. Which part of that is a failure? Particularly you state "is only possible if anon is placed under similar constraints, such that it would effectively be identical to Edward" but I haven't seen an example yet that really needs what you are referring to.
In particular, I went through the exercise of rewriting this file. I think it is reasonably clean looking, and handles the condition case you mention:
https://github.com/harvardnlp/pyro/blob/anonymous-sites/tutorial_ette/source/intro_part_ii.ipynb
As I have said above, I have no problem with string-lazy access. My problem is with default global variables as the recommended way you are advising users to use the library. I really like the dynamic design of PyRo, I agree with your design goals. I just feel that these things do not need to be handled globally but can be built up explicitly in the functions by mirroring the semantics of python.
Which part of that is a failure?
There's no failure there! I'm just trying to point out that anon
is a different point on the Pareto frontier induced by the fact that models, guides, and data can be arbitrarily different.
I meant in the sense of "what flexibility is it giving up?". Different point implies there are negatives, but don't see how it is more constrained. So far we are able to do all the tutorial examples using it while removing global variables (which I think we can agree are negative).
Curious why all the params are auto-magically global with a shared namespace. Seems like a very different design choice than the nice cleanly scoped pytorch variables.