pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.59k stars 1.99k forks source link

Vectors of multivariate variables #535

Closed fonnesbeck closed 5 years ago

fonnesbeck commented 10 years ago

It would be useful if we could model multiple independent multivariate variables in the same statement. For example, if I wanted four multivariate normal vectors with the same prior, I should be able to specify:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), shape=(4,3))

but it currently returns a ValueError complaining of non-aligned matrices.

jsalvatier commented 10 years ago

will it be obvious what dimension is the multivariate dimension?

On Fri, May 2, 2014 at 10:16 AM, Chris Fonnesbeck notifications@github.comwrote:

It would be useful if we could model multiple independent multivariate variables in the same statement. For example, if I wanted four multivariate normal vectors with the same prior, I should be able to specify:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), shape=(4,3))

but it currently returns a ValueError complaining of non-aligned matrices.

— Reply to this email directly or view it on GitHubhttps://github.com/pymc-devs/pymc/issues/535 .

fonnesbeck commented 10 years ago

It should be intuitive, if not obvious. I think most people would expect a vector of variables, which implies that the first dimension is the number of variable elements and the remaining dimension(s) the size of each variable.

fonnesbeck commented 10 years ago

We at least need to be able to do the analog of this:

m = [pm.MvNormal('m_{}'.format(i), mu, Tau, value=[0]*3) for i in range(len(unique_studies))]

This has been a show-stopper for me trying to use PyMC 3 for new work, so I'm going to try to set aside some time to work on this.

Thinking about it some more, however, I think that shape is not the appropriate way to specify the dimension of a multivariate variable -- that should be reserved for the size of the vector of variables. Might be best to have:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), dim=3)

for a single variable and:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), shape=4, dim=3)

for a vector containing 4 MvNormals of dimension 3. Better yet, we ought to be able to infer the dimension of the MvNormal from its arguments.

jsalvatier commented 10 years ago

That makes some sense. The words shape and dim seem very close, so it seems confusing to have both

On Thu, May 29, 2014 at 1:30 PM, Chris Fonnesbeck notifications@github.comwrote:

We at least need to be able to do the analog of this:

m = [pm.MvNormal('m_{}'.format(i), mu, Tau, value=[0]*3) for i in range(len(unique_studies))]

This has been a show-stopper for me trying to use PyMC 3 for new work, so I'm going to try to set aside some time to work on this.

Thinking about it some more, however, I think that shape is not the appropriate way to specify the dimension of a multivariate variable -- that should be reserved for the size of the vector of variables. Might be best to have:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), dim=3)

for a single variable and:

f = pm.MvNormal('f', np.zeros(3), np.eye(3), shape=4, dim=3)

for a vector containing 4 MvNormals of dimension 3. Better yet, we ought to be able to infer the dimension of the MvNormal from its arguments.

— Reply to this email directly or view it on GitHubhttps://github.com/pymc-devs/pymc/issues/535#issuecomment-44581060 .

fonnesbeck commented 9 years ago

Perhaps using plates here would be clearer, since this is common terminology in graphical models. The we could generalize the business of generating vectors of variables.

fonnesbeck commented 9 years ago

Just bumping this one. I come up against it frequently in epidemiological analyses.

aflaxman commented 9 years ago

I like the originally proposed notation, shape=(4,3), since that will be the shape of f.value. Am I stuck in a PyMC2 way of thinking?

fonnesbeck commented 9 years ago

I'd be happy with that. We would just have to adopt the convention that the last dimension is always the size of the individual multivariate node, and not the size of the array containing the nodes. The tricky part comes when you have, say, a vector of Wisharts that is itself multidimensional, so the total shape could be (4,4,3,3) for a 4x4 array of 3x3 variables.

twiecki commented 9 years ago

I would imagine it's a rare case but can't hurt to consider it and come up with a sane way to handle. In the end, complex things will be complex in code but defaulting to the last dimensions is an easy rule to keep in mind.

aflaxman commented 9 years ago

+1 for shape=(4,4,3,3) to get a 4x4 array of 3x3 wisharts. So

C = pm.WishartCov('C', C=np.eye(3), n=5)
C.value.shape == (3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=(4,3,3))
C.value.shape == (4,3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=(4,4,3,3))
C.value.shape == (4,4,3,3)
fonnesbeck commented 9 years ago

Multivariate classes could have the appropriate dimension specified in the class to know how to deal with the shape argument. Wisharts will always be 2-dimensional, for example, so any remaining dimensions will always be how many wisharts are in the set. Multinomials will always be a 1-d vector, etc.

jsalvatier commented 9 years ago

Okay, are we agreed that when we do this the multivariate dimensions start at the back?

We could start them at the front, but the way numpy.dot works suggests at the back.

twiecki commented 9 years ago

Agree on defaulting to last dimension.

I wonder, is the shape argument not redundant? Seems like we can always infer it from the inputs. shape could then only add the dimensions. Personally I would find this less confusing:

C = pm.WishartCov('C', C=np.eye(3), n=5)
C.value.shape == (3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=4)
C.value.shape == (4,3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=(4,4)))
C.value.shape == (4,4,3,3)

The 3,3 is already encoded in np.eye(3), no?

jsalvatier commented 9 years ago

Shape is not redundant when you want to have the same prior arguments for a bunch of variables.

On Mon, Jul 27, 2015 at 2:14 PM Thomas Wiecki notifications@github.com wrote:

Agree on defaulting to last dimension.

I wonder, is the shape argument not redundant? Seems like we can always infer it from the inputs. shape could then only add the dimensions. Personally I would find this less confusing:

C = pm.WishartCov('C', C=np.eye(3), n=5) C.value.shape == (3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=4) C.value.shape == (4,3,3)

C = pm.WishartCov('C', C=np.eye(3), n=5, shape=(4,4))) C.value.shape == (4,4,3,3)

The 3,3 is already encoded in np.eye(3), no?

— Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/535#issuecomment-125345104.

twiecki commented 9 years ago

right, I'm only talking about the case where the input to the RV (e.g. C above) is multi-dimensional already. Then you can use shape to repeat that input arbitrarily.

pm.Normal('x', mu=[1, 2, 3], shape=2) would give a 2x3 in my proposal.

jsalvatier commented 9 years ago

Yeah, we could do that. I'm slightly worried that its going to make implementation more complex. And perhaps be confusing to users. But maybe either way is going to be confusing.

On Mon, Jul 27, 2015 at 2:23 PM Thomas Wiecki notifications@github.com wrote:

right, I'm only talking about the case where the input to the RV (e.g. C above) is multi-dimensional already. Then you can use shape to repeat that input arbitrarily.

pm.Normal('x', mu=[1, 2, 3], shape=2) would give a 2x3 in my proposal.

— Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/535#issuecomment-125346789.

twiecki commented 9 years ago

I recently ran into the confusion where I wanted 2 Dirichlets of len 3, should I do: pm.Dirichlet(np.ones((2, 3)), or should I do pm.Dirichlet(np.ones((2, 3)), shape=(2, 3)) or maybe pm.Dirichlet(np.ones((2, 3)), shape=2) or pm.Dirichlet(np.ones(3), shape=2)? Are some equivalent? I actually still don't know.

So with my proposal there's a clear rule and I don't have to remember which dimensions of the shape kwarg match to which dimensions of my input.

Why do you think it would be harder to implement?

jsalvatier commented 9 years ago

That does seem attractive from an API point of view.

Let me check how that plays with broadcasting rules.

jsalvatier commented 9 years ago

That does seem to play nicely with things.

I see two issues. Maybe we can resolve them.

First, this change will break previously working models.

Second, shape is a common argument for all distributions and this means the shape argument won't match the actual shape of the variable. I think that might not actually break anything right now, but seems like a bug waiting to happen.

Perhaps we should have a different argument, not shape for multivariate distributions, but count or dimensions or something else that is used to compute the shape. That would make it more obvious that the behavior is different.

twiecki commented 9 years ago

Or maybe repeat? pm.Dirichlet(np.ones(3), repeat=2) would give a 2x3.

What I also like about this is that it makes the translation from pymc2 style [pm.Dirichlet(np.ones(3)) for i in range(2)] more direct.

So if we were to change this, do we still need the shape kwarg? Do we deprecate it?

twiecki commented 9 years ago

And maybe we could even use theano.tensor.extra_ops.repeat(x, repeats, axis=None) for this.

twiecki commented 9 years ago

Theoretically we could even teach users to use repeat directly and not be concerned with all this in the API. E.g.:

from pymc3 import repeat # alias to theano.tensor.extra_ops.repeat
pm.Dirichlet(repeat(my_prior, 2)) # gives 2x3 if my_prior is shape 3
fonnesbeck commented 9 years ago

I don't think we should worry about breaking changes too much in a beta for such an important design decision.

I like the idea of a dim (dimension) argument that represents the shape of the variable, rather than how many of them there are:

Sigma = Wishart('Sigma', 4, np.eye(3), dim=3)
mu = pm.Normal('mu', 0, 0.001, shape=3)
x = pm.MvNormal('x', mu, Sigma, dim=3, shape=5)

which results in an x that consists of 5 multivariate normals, each of dimension 3.

jsalvatier commented 9 years ago

Shape currently means the actual shape of the resulting variable, and I kind of want to keep that unless there's a good reason.

fonnesbeck commented 9 years ago

So, it would be x = pm.MvNormal('x', mu, Sigma, dim=3, shape=(5,3))? I wonder if it can just be inferred, then, from the shapes of mu and Sigma?

Or, why not have the variable compose the variable shape from dim and shape, just to be explicit ?

jsalvatier commented 9 years ago

Dim seems like it should always be inferrable and we should compute shape. Then we could have count or something which gives the non-dim shape of the variable?

twiecki commented 9 years ago

I think that with count (or repeat) we can infer dim and shape. IMO it's really bad user experience if there's 3 ways to specify the same redundant thing.

fonnesbeck commented 8 years ago

This is something we need to solve before a 1.0 release. Brought up again in #824

twiecki commented 8 years ago

@fonnesbeck What's your take on the proposal to remove shape in favor of count or repeat?

fonnesbeck commented 8 years ago

I like the idea of having a count argument that defaults to 1, and the shape of the variable is inferred. So, for example:

X = Normal('X', 0, 1) results in a scalar normal variable
X = Normal('X', [0,0], [1,2]) results in a vector-valued normal, as inferred from the arguments
X = Normal('X', 0, 1, count=2) results in an array of 2 scalar normals
X = MvNormal('X', mu, Sigma) results in a single multvariate normal whose shape is inferred from mu and Sigma
X = MvNormal('X', mu, Sigma, count=2) results in a vector of multvariate normals, the size of each element inferred from mu and Sigma

Will this work, and does it provide a clear separation between the shape of a variable and the size of a vector of variables?

twiecki commented 8 years ago

@fonnesbeck sounds great then. I think we can use theano.tile and deprecate the shape argument (which will be inferred). And should the resulting shape be the shape of the final RV (i.e. including count)?

For example Normal('X', 0, 1, count=2).shape == (1, 2)?

twiecki commented 8 years ago

@jsalvatier I can't find where the shape argument takes effect, could you point me to that?

fonnesbeck commented 8 years ago

@twiecki I guess it would have to be, since we can't easily iterate over things in Theano. In PyMC 2, this would have been a Container object.

BTW, I think the shape there would be (2,1). The first dimension should be the count, and all subsequent dimensions the shape, I would think.

twiecki commented 8 years ago

Right, (2, 1)

jsalvatier commented 8 years ago

@twiecki Do you mean here? https://github.com/pymc-devs/pymc3/blob/master/pymc3/model.py#L438

You're looking to change the name of 'shape'?

twiecki commented 8 years ago

@jsalvatier Yes. I just went through the code but it seems that inferring the shape from the input arguments (e.g. mu) might be difficult because they don't get forwarded to the super class (i.e. Distribution which sets the shape parameter). So for example, Normal() gets a float or an array for mu but only stores it and doesn't let the parent class know about it where would want to infer its shape.

Unfortunately, I don't see an easy way to change that without a medium refactoring. Ideas?

fonnesbeck commented 8 years ago

There will still be a shape attribute, but it will not be an argument for creating variables.

twiecki commented 8 years ago

Of course, we could add specific logic to each Distribution to extract the shape from the input parameters. That doesn't strike me as very elegant though.

fonnesbeck commented 8 years ago

That could be done as a decorator, making it less inelegant.

twiecki commented 8 years ago

Good idea, that would work. Then we infer the shape in the child class, pass it to the base class along with count where the broadcast is done. Easy peasy.

Just not sure about the kwarg name. Don't really like count too much. How about repeat or tile?

fonnesbeck commented 8 years ago

It should be something that clearly denotes a number (I.e. variable count) of variables, as separate from the size or dimension of the variable. Not sure tile or repeat make this clear, but I could be convinced. I suppose tile to be consistent with Theano? Perhaps nvars or similar?

twiecki commented 8 years ago

well it could be multdim, no? E.g. Normal(0, 1, repeat=(3, 3)) would give a 3x3, same as Normal(np.zeros((3, 3)), np.ones((3, 3))). Not ideal to have two ways to specify the same thing but probably not unreasonable. Both tile and repeat have connections to theano.

fonnesbeck commented 8 years ago

I would argue that those are two different parameterizations; the first is a single multidimensional normal with shape (3,3) and the second a 3x3 matrix of univariate normals. The difference is trivial in this example, but when you pop a "true" multvariate distribution in place of the normal the distinction is real.

MvNormal('X', np.zeros(2), np.eye(2), tile=2)

This cannot be expressed without the tile argument.

One possible utility of the difference in parameterizations may be in whether a variable gets block-updated or not, say, in a Metropolis algorithm. In the first parameterization perhaps it would, and the second it would not.

twiecki commented 8 years ago

OK, we're on the same page. I like the idea of Metropolis but that's probably better left for later.

I don't think this would be hard to implement at all with the decorator. The key is that we just pass the correct shape to the base class where the broadcasting is already implemented. The logic would be something like: shape = np.concatenate(mu.shape, tile)

The only thing I noticed, which I don't think is a big deal, is that numpy.tile works slight different as it requires specification of the full shape: image

But it'd be fine for our tile to work differently. It feels pretty intuitive to me (unlike our earlier shape spec).

twiecki commented 8 years ago

Turns out a decorator is difficult because parameters can either be args or kwargs. Instead, here's a function that infers the shape given one parameter and kwargs passed to Distribution:

import warnings

def infer_shape(param, kwargs):
    shape = kwargs.get('shape', None)
    if shape is not None:
        warnings.warn('Explicitly specifying the shape argument will be deprecated. Shape will be inferred from the input variables as well as the new tile keyword argument.',
                      PendingDeprecationWarning)
        shape = np.atleast_1d(shape)

    tile = kwargs.get('tile', ())

    if shape is None:
        if isinstance(param, (float, int)):
            shape = ()
        else:
            shape = np.asarray(param).shape

    inferred_shape = np.concatenate([np.atleast_1d(tile), shape])
    if len(inferred_shape) == 0:
        inferred_shape = np.array([1])

    return np.int8(inferred_shape)

print(infer_shape(0, {}))
print(infer_shape([0, 0], {}))
print(infer_shape([0, 0], {'tile': 3}))
print(infer_shape(0, {'tile': 3}))
print(infer_shape(np.zeros((3, 3)), {'tile': 2}))
print(infer_shape(np.zeros((3, 3)), {'shape': 2, 'tile': 3}))

Output:

[1]
[2]
[3 2]
[3]
[2 3 3]
[3 2]

Btw. I saw above you suggested plates that might also be a good name for tile.

twiecki commented 8 years ago

image

twiecki commented 8 years ago

The remaining issue is related to theano not allowing linear algebra on higher-dimensional variables. I started a PR here to support it https://github.com/Theano/Theano/pull/3836

fonnesbeck commented 8 years ago

We could require that tile be a keyword argument. I don't think that would be a huge inconvenience.

twiecki commented 8 years ago

tile is already a keyword argument. The problem is that mu can be a kwarg or an arg.

twiecki commented 8 years ago

@fonnesbeck supporing multidimensional linear algebra in theano is tricky. However, I was able to do this with list comprehension:

%matplotlib inline
import pymc3 as pm
import numpy as np
import scipy.stats as stats
cov = np.asarray([[1, 0, 0],
                  [0, 1, 0],
                  [0, 0, 1]])

rand = np.random.multivariate_normal([0, 1, 2], cov, size=(2, 100))

with pm.Model() as model:
    mu = pm.Normal('mu', mu=1, sd=2, tile=(2, 3))
    tau = np.stack([np.eye(3), np.eye(3)])
    like = [pm.MvNormal('like%i', mu=mu[i, :], tau=tau[i, :, :], observed=rand[i, :, :]) for i in range(2)]

    trace = pm.sample(500)

That should give us what we need, no?