pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.71k stars 2.01k forks source link

Model specification syntax #189

Closed fonnesbeck closed 11 years ago

fonnesbeck commented 11 years ago

I thought it would be worth discussing how models are specified in PyMC3, and revisit whether it represents the best user experience that we can offer. I'm coming to terms with the fact that PyMC3 will not look like PyMC2, but I think we should offer something analogously clean and clear, since so many of the user base will be scientists first and Python programmers second.

One example is the following:

H = model.d2logpc()

s = find_MAP(model)

step = hmc_step(model, model.vars, H(s))

I'm wondering if, for example, all the business with H cannot happen automagically inside of hmc_step, so that we only need to do:

s = find_MAP(model)

step = hmc_step(model, model.vars, s)

The secret to making model specification clean is separating the "what" from the "how" and hiding as much of the "how" as possible. Not so say, of course, that we limit the flexibility of what users can do with PyMC, but only exposing the complexity when it is needed. An example of this from PyMC2 is that the selection of step methods is done automatically, so it appears nowhere in the user specification of the model, but at any time I can go in and call MCMC.use_step_method() to override the choice.

Let's use this thread to discuss any potential improvements to model specification in PyMC3. I'm hoping others aside from just @jsalvatier and I will participate! What works, what doesn't, what can be improved?

@twiecki

jsalvatier commented 11 years ago

I like this example. We can just test whether something is a point or a matrix/vector and then try to make the inference. My only reluctance is that computations built using hessian() seem to be fairly slow to compile. Perhaps the solution to that is to cache the compiled versions.

jsalvatier commented 11 years ago

Possible usability improvements to hmc_step:

jsalvatier commented 11 years ago

Maybe post this to the mailing list?

jsalvatier commented 11 years ago

We could probably get rid of the sampler states or at least make them an internal property. My notion was that it would be good to explicitly pass around the sampler state, but I don't think we've seen that pay off, and it does add an extra return parameter to sample, and make CompoundStep more complex.

aflaxman commented 11 years ago

I'd love to be part of this discussion, but I think I'll need to get up to speed to contribute anything useful. First question: what is "d2logpc"?

fonnesbeck commented 11 years ago

It generates the Hessian of the model log-probability for use in the Hamiltonian Monte Carlo step method. It is something I am suggesting could be automated.

aflaxman commented 11 years ago

This is very exciting stuff. As best as I can understand @fonnesbeck 's suggestion makes sense.

I'm sorry again for my ignorance, but how will deterministics and potentials fit into this framework?

I like the PyMC2 framework, but I hope to come around to PyMC3 as well. Progress is good.

fonnesbeck commented 11 years ago

Because stochastics are based on Theano tensor objects, you essentially get deterministics for free because you can perform arithmetic operations and transformations on them (see the probabilities in the logistic model example as a simple example of this). What we don't have (yet) are traces for these deterministics. Factor potentials are not available yet; we need to add this to the list.

mamacneil commented 11 years ago

Gents, As the guinea pig target audience (i.e. a user not a programmer) I am keen to pitch in and will likely push for more automation rather than less. In looking at the logistic model example:

effects = Var('effects', Normal(mu = 0, tau = 2.**-2), (1, npred))

looks less clean to me than

effects = Normal('effects', mu = 0, tau = 2.**-2, value=(1, npred))

which is unambiguous to anyone familiar with Bayes notation. I suppose this would be my wish, to keep as much as possible in notation style and automate the rest, with the option for the flexibility we have come to love. Even the numpy normal used to generate the data is xtrue = normal(scale = 2., size = 1), which makes sense to, what I suspect is a major target audience, those fed up with WinBUGS.

One thing that might help the chattering classes is to get a rough idea of the challenges or ways in which the syntax needs to change; people might have better ideas if they know the constraints.

jsalvatier commented 11 years ago

Hey @mamacneil , thanks for helping out.

That's something I've struggled with. I definitely agree it would be good to have the x = Normal(...) syntax. But there are a few issues I've run into.

First is that there is no clear ownership of random variables created in this way. There is no way for a model to say 'I should be tracking these random variables'. This is not a major issue, but does mean you have to manually tell the model which random variables it should care about.

Second, this syntax sort of ties you to one way of creating a random variable. It becomes significantly more complex to write a TransformedVar function (which specifies creates a random variable in terms of one space, but allows you to sample in another space). It maybe be possible to keep this kind of functionality, but it would definitely be more complicated.

Third, the syntax sort of confuses the difference between a distribution and a random variable. Right now, when you call Normal(mu, tau) you create a distribution, while Var (and other functions) create a random variable. The x = Normal('x',... ) syntax doesn't leave a way for you to manipulate distributions themselves which can be useful (truncating distributions, applying transformations, finding the expectation etc.).

Overall these points don't seem that strong to me, so I'll think about whether there's a way to overcome them.

twiecki commented 11 years ago

Maybe we don't have to break the existing way which I kinda like but just add a way of passing a list of variables to the init method of the Model class.

jsalvatier commented 11 years ago

One way to think of Var and its variants is as the ~ bit in the x ~ Normal(mu, tau) syntax.

jsalvatier commented 11 years ago

Possible alternative syntaxes:

syntax comment
1 x = Var('x', Normal(mu, tau, dtype, shape)) This is just a variation on the current syntax, where we make dtype and shape a property of distributions.
2 x = Var('x') << Normal(mu, tau, dtype, shape)
3 x = ~Normal('x', mu, tau, dtype, shape) This one would require later manual collection; something like model = Model([x,y,obervations])
4 x = model << Normal('x', mu, tau, dtype, shape)
5 x = 'x' << Normal(mu, tau, dtype, shape)

Thoughts?

twiecki commented 11 years ago

Personally, I'm not a fan of any of them (compared to the current syntax).

It's true that ideally we'd have 'x ~ Normal()' as in BUGS. But these packages have model specification and other code well separated which we can't. pymc2 solved this by requiring you to pass all random variables to MCMC(). The problem is that there is not real sense of a model other than a list of random variables. I think pymc3 improves on this by having the whole model in one place.

I guess what I'm saying is that the logic @jsalvatier outlined above is a good motivation to stick with the current interface. It's a deviation from what pymc2 was but so is everything else.

aflaxman commented 11 years ago

I also am still attached to the x = Normal('x', mu, tau) approach in PyMC2, and I don't mind making a dict of all the nodes in a model.

I don't understand the other two points in comment https://github.com/pymc-devs/pymc/issues/189#issuecomment-15314198 by @jsalvatier, though, so I may be missing something here. Perhaps more details will help me see.

What is a TransformedVar? My first impression is that this sounds like information that should go into a StepMethod, not a Stochastic.

What is the distinction about distributions and random variables you are making here? Is this about the composition of distributions, i.e. a Normal distribution composed with a truncation operator yields a truncated normal distribution?

It seems like you've put a lot of thought into this, so maybe some use cases/user stories would help to bring the rest of us along.

jsalvatier commented 11 years ago

I'll try to supply some more detail tomorrow evening.

Lets say you have a scale variable tau, which has an InvGamma prior distribution: tau ~ InvGamma(.01, .01), which translates to tau = Var('tau', InvGamma(.01, .01)). However, you think that it would be easier to sample from the posterior if tau was in log space. You want the likelihood function is parameterized by log_tau = log(tau), not tau. This translates to log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform).

I like your idea of having that transformation step go into a step method, not the model. That does seem cleaner. We'd have to come up with a way to do that though.

Re: distributions vs. random variables. Yes, truncation operators are the sort of thing I have in mind. Sometimes you want to manipulate the distribution itself. Another example, you might want to find the expectation/median/mode/variance of a distribution as distinct from finding the expectation of a random variable (conditioning on other things).

jsalvatier commented 11 years ago

@aflaxman The main notable use of different functions for adding random variables has been TransformedVar, so if that could effectively be moved to a post model building step, I think that would be a strong argument for doing so, and then using simpler syntax.

Does the part about distributions not being the same random variables make sense now? There are a couple of use cases: truncation/bounding, linear transforms, finding properties (mean/mode/variance) of a distribution, and using them as building blocks (say ZeroInflatedPoisson out of Poisson and Bernoulli or GaussianRandomWalk out of Normal).

jsalvatier commented 11 years ago

@fonnesbeck Do you think it would be advisable to support both syntaxes?

x = Var('x', Normal(mu, tau, dtype, shape)) and x = ~Normal('x', mu, tau, dtype, shape)

I've been thinking about it and I don't think it would be too challenging. However, it seems like it could make the package significantly more confusing.

johnmcdonnell commented 11 years ago

As another ill-informed guinea pig on this project, I like the use of Var because it's clear what's general about the call: the call will create a new random variable, the first parameter for a random variable is its name, then the underlying distribution. If you made a variable with a different distribution, you'd still use Var but your second parameter would change, by syntax not just convention.

However I do consider these decisions to be mostly aesthetic, and I have no idea if my perspective is typical.

jsalvatier commented 11 years ago

Here's my current idea: It turns out that Python with statements can be used to do context management. We can use a with statement to set a model as the 'currently active model' globally and functions that implicitly refer to a model will use this model behind the scenes.

with Model() as model: 
    x = ~Normal('x', 0, 1, shape, dtype)
    # same as 
    #x = Var('x', Normal(0, 1, shape, dtype))
    # which will also be supported
    log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)
...

with statements work well with nesting, so it would work fine to say

with Model() as model1: 
     x = ~Normal('x', 0, 1, shape, dtype)
     with Model() as model2: 
          y = ~Normal('y', 0, 1, shape, dtype)
     z = ~Normal('z', 0, 1, shape, dtype)

The result being that model1 will have 2 variables x and z and model2 will have 1 variable y.

twiecki commented 11 years ago

I like the with statement -- very pythonic.

The ~Normal syntax strikes me a bit odd though. Do we need it?

johnmcdonnell commented 11 years ago

Yeah I find it weird to have "TransformedVar" and also ~Normal, but I guess like I said I don't really understand the problem with Var.

mamacneil commented 11 years ago

Interesting, I agree that this is much improved over the Var statements and that we don't need the tilde at all (they're not in there now). If having the with statement is the major change in the syntax I think that could work. I will say pushing to keep as near the current syntax as possible should be the goal - it's damn good in my opinion - but a single line to initiate things isn't much of a burden. A

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

On 27/03/2013, at 2:44 AM, John Salvatier notifications@github.com wrote:

Here's my current idea: It turns out that Python with statements can be used to do context management. We can use a with statement to set a model as the 'currently active model' globally and functions that implicitly refer to a model will use this model behind the scenes.

with Model() as model: x = ~Normal('x', 0, 1, shape, dtype)

same as

#x = Var('x', Normal(0, 1, shape, dtype))
# which will also be supported
log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)

... with statements work well with nesting, so it would work fine to say

with Model() as model1: x = ~Normal('x', 0, 1, shape, dtype) with Model() as model2: y = ~Normal('y', 0, 1, shape, dtype) z = ~Normal('z', 0, 1, shape, dtype) The result being that model1 will have 2 variables x and z and model2 will have 1 variable y.

— Reply to this email directly or view it on GitHub.

jsalvatier commented 11 years ago

I think that the ~ is important actually. The ~ lets us distinguish between distributions and random variables.

~ acts as a constructor for a random variable, basically acting like Var, except that it's a method on the distribution. What I have in mind is that x = Normal('x', mu,tau, shape, dtype) constructs a normal distribution with a particular shape, data type, (and a name, 'x', that will be used if the distribution is used to construct a distribution but not otherwise). x.__invert__ (used to implement the ~ operator) constructs a random variable with x as its conditional distribution and adds it to the model in the current context.

aflaxman commented 11 years ago

I'm growing skeptical of these syntactic changes, but I'll try to keep an open mind. For example, I don't see why the proposed approach to transformations of

log_tau, tau = TransformedVar('tau', InvGamma(.01, .01), logtransform)

is preferable to the current solution in PyMC2, which seems simpler and cleaner to me:

tau = InverseGamma('tau', .01, .01)
log_tau = log(tau)

I would like to see as clean a separation as possible between model and step method, so if log_tau is not needed for the model, I propose something akin to

mcmc.use_step_method(LogTransform(Metropolis), tau)

Regarding distributions not being the same random variables, maybe there is room for a middle-ground solution here, too. For example, if you could say

tau.logp(parameter_value)

in the InverseGamma stochastic above, would that meet your needs for calculating moments? The approach PyMC2 has with InverseGamma matched to inverse_gamma_like, etc, has worked for simple building blocks for me so far. Maybe there is a way to extend this so that it works for you, too.

jsalvatier commented 11 years ago
  1. It's because those solve different problems, TransformedVar puts the actual random variable though which you can do sampling into log space. You can't tell a step method to sample the log_tau variable in pymc2 because it's a deterministic. I do agree that your suggestion of moving the TransformedVar functionality to be handled by step methods or some other post model building step would be best. Like you say, its cleaner to separate model and sampling issues as much as possible. I intend to move the TransformedVar in pymc3 to some post model building stage at some point, however, I'm not sure how to do it yet (I haven't spent time thinking about it)

The issue I see with directly transforming step methods directly like in mcmc.use_step_method(LogTransform(Metropolis), tau) is that in pymc3 step methods commonly do block updating, and you often will want to transform just one variable. I'm sure something along these lines is workable though.

  1. Re: distributions/random variables: Unfortunately that doesn't satisfy me. For example, this doesn't work cleanly with truncating distributions. Random variables and distributions are different concepts and should be different types. Having the nice x = Normal('x', mu, tau, ...) syntax is valuable, though, so its worth having some limited magic in order to have it work nicely. Coming up with the right magic is the tricky part. Some options:
    • The ~ operator turns a distribution into a random variable. Seems unpopular.
    • If there is a current model context when Normal() is called with a name or possibly given the observed keyword then it will create an random variable, otherwise a distribution. If there's no current model context then passing a name or observed keyword will throw an error.
fonnesbeck commented 11 years ago

Two suggestions:

twiecki commented 11 years ago

Personally, I think the upper/lower case is too similar syntax wise.

I do like the second option. Could be even more explicit and do NormalVar and NormalDist.

fonnesbeck commented 11 years ago

A third option would be to have something like Normal for the random variable and normal_dist for the distribution, which is analogous with PyMC2's x_like convention for log-likelihoods.

jsalvatier commented 11 years ago

I don't really like the 'different names for variables and dists' approach. Seems like making a random variable should be a function.

I have a version working where Normal('x', mu, tau, shape) produces a random variable (and attaches it to the current model, throwing an error if there is no current model) and Normal(mu, tau, shape) produces a distribution. I'll post it soon.

Another possibility which I personally like is x = Normal(mu, tau, shape)('x'). Normal() produces a distribution, and call on a distribution produces a random variable.

fonnesbeck commented 11 years ago

I really like your first implementation.

I suppose the second one would make sense when you would want to use the same distribution to generate several random variables:

normal_prior = Normal(0, 0.0001)
betas = [normal_prior(i) for i in ('intercept', 'age_effect', 'treatment_effect')]

Though this may be more confusing than convenient.

mamacneil commented 11 years ago

Yes, I like the first implementation also - clear, simple, and close to current function on the surface. A

On 2013-04-03, at 12:44 PM, John Salvatier wrote:

I'm don't really like the 'different names for variables and dists' approach, seems clunky. I have a version working where Normal('x', mu, tau, shape) produces a random variable (and attaches it to the current model, throwing an error if there is no current model) and Normal(mu, tau, shape) produces a distribution.

Another possibility which I personally like is x = Normal(mu, tau, shape)('x'). Normal() produces a distribution, and call on a distribution produces a random variable.

— Reply to this email directly or view it on GitHub.

jsalvatier commented 11 years ago

New implementation is finished: see examples http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/stochastic_volatility.ipynb and http://nbviewer.ipython.org/urls/raw.github.com/pymc-devs/pymc/pymc3/examples/tutorial.ipynb (the text doesn't reflect the changes yet).

I've also made the functions, find_MAP, find_hessian, sample and so forth take a model parameter, which you do not need to supply if within a model context (with statement). I like this change, but not committed to it, so let me know if you like it.

Syntax suggestions still welcome, but I'm feeling pretty good about this.

twiecki commented 11 years ago

That syntax looks great, @jsalvatier!

A few questions:

with model: 
    sigma, log_sigma = model.TransformedVar('sigma', Exponential(1./.02, testval = .1),
                 logtransform)

Is the model.TransformedVar required as opposed to just TransformedVar as the other calls?

Similarly:

start = find_MAP(model, vars = [s], fmin = optimize.fmin_l_bfgs_b)
jsalvatier commented 11 years ago

The first one is, but the second one is not (I should remove that one).

Currently those model parameters are basically default parameters that can go up in front. The machinery is in "withcontext" here https://github.com/pymc-devs/pymc/blob/pymc3/pymc/model.py. I wonder if this is worth it, perhaps it would be best to just have them be regular arguments with defaults (and the default to be whatever is in the context). This would look a little less pretty, but would mean there's less funny business that can cause problems in the future.

Basically, should the explicit syntax look like

start = find_MAP([s], model = model)

or

start = find_MAP(model, [s])

The first one is doing exactly what it looks like it's doing, the second one has some funny business going on: it checks to see if the first arg is a 'Model' and if so, does nothing, if it's not it gets the model context and inserts it into the argument list.

twiecki commented 11 years ago

I'd say definitely the first one then. Less funny business = good. ;)

jsalvatier commented 11 years ago

I've implemented the first, like you say, less funny business is good.

With that, I'm pretty happy with the syntax.

twiecki commented 11 years ago

Yeah, I saw the updated syntax and code. I agree that it looks great now. It's a nice combination of pymc2 flexibility and BUGS-style model specification. Also, I imagine the with syntax to be the main form of using pymc3 so I think it's fine to have a model kwarg since it's mostly used by people who know what they are doing.