Multivariate Gaussian Mixture Model using pm.Mixture not working

schlichtanders commented 7 years ago

Dear community,

I tried to merge the multivariate gaussian tutorial together with the tutorial on dirichlet gaussian mixture models

However, it seems like the sampler does not work at all. The sampled weights w stay constant throughout the sampling procedure. I build a working example and put it into the following gist (it kind of combines both mentioned tutorials): https://gist.github.com/anonymous/02f805be39a194b0733c835e7225c753

To run the code, you need to apply a very small fix to the code base, which is already made available in the fork https://github.com/schlichtanders/pymc3 or pull request https://github.com/pymc-devs/pymc3/pull/1810 respectively. Further note that I had to use Metropolis Sampler, as NUTS produces NaNs.

My environment:

OS: Windows 10
Python: 3.6
theano: 0.9.0b1 (cloned from master)
pymc3: master of https://github.com/schlichtanders/pymc3

twiecki commented 7 years ago

@schlichtanders Can this be closed?

schlichtanders commented 7 years ago

please not, I really thought, the code linked in the gist would work and sample from the posterior distribution.

But that cannot be true as the posterior distribution surely is not constant

twiecki commented 7 years ago

I just assumed https://github.com/pymc-devs/pymc3/pull/1810 was the fix, but apparently that's not the case.

twiecki commented 7 years ago

I see, maybe @AustinRochford is interested in taking a look.

AustinRochford commented 7 years ago

I get the following error when trying to define that model

AttributeError                            Traceback (most recent call last)
/home/jovyan/pymc3/pymc3/distributions/mixture.py in _comp_logp(self, value)
     82 
---> 83             return comp_dists.logp(value_)
     84         except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-8-eadc4ec0b7ee> in <module>()
      4     beta = pm.Beta('beta', 1, alpha, shape=K)
      5     w = pm.Deterministic('w', stick_breaking(beta))
----> 6     obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data)

/home/jovyan/pymc3/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     32             data = kwargs.pop('observed', None)
     33             total_size = kwargs.pop('total_size', None)
---> 34             dist = cls.dist(*args, **kwargs)
     35             return model.Var(name, dist, data, total_size)
     36         else:

/home/jovyan/pymc3/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
     43     def dist(cls, *args, **kwargs):
     44         dist = object.__new__(cls)
---> 45         dist.__init__(*args, **kwargs)
     46         return dist
     47 

/home/jovyan/pymc3/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs)
     64             # for logp the different modes are like different observations, i.e. rows, hence use comp_modes.T
     65             # logPs is a vector, hence self.logp(..).T == self.logp(..)
---> 66             comp_mode_logps = self.logp(comp_modes.T)
     67             self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)]
     68 

/home/jovyan/pymc3/pymc3/distributions/mixture.py in logp(self, value)
    112         w = self.w
    113 
--> 114         return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
    115                      w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1))
    116 

/home/jovyan/pymc3/pymc3/distributions/mixture.py in _comp_logp(self, value)
     83             return comp_dists.logp(value_)
     84         except AttributeError:
---> 85             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
     86                             axis=1)
     87 

/home/jovyan/pymc3/pymc3/distributions/mixture.py in <listcomp>(.0)
     83             return comp_dists.logp(value_)
     84         except AttributeError:
---> 85             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
     86                             axis=1)
     87 

/home/jovyan/pymc3/pymc3/distributions/multivariate.py in logp(self, value)
    130         else:
    131             result -= logdet(tau)
--> 132         result += (tt.dot(tau, delta) * delta).sum(axis=delta.ndim - 1)
    133         return -1 / 2. * result
    134 

/opt/conda/lib/python3.5/site-packages/theano/tensor/basic.py in dot(a, b)
   5968         return tensordot(a, b, [[a.ndim - 1], [numpy.maximum(0, b.ndim - 2)]])
   5969     else:
-> 5970         return _dot(a, b)
   5971 
   5972 

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    661                 thunk.outputs = [storage_map[v] for v in node.outputs]
    662 
--> 663                 required = thunk()
    664                 assert not required  # We provided all inputs
    665 

/opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n)
    859             # default arguments are stored in the closure of `rval`
    860             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 861                 r = p(n, [x[0] for x in i], o)
    862                 for o in node.outputs:
    863                     compute_map[o][0] = True

/opt/conda/lib/python3.5/site-packages/theano/tensor/basic.py in perform(self, node, inp, out)
   5786         # gives a numpy float object but we need to return a 0d
   5787         # ndarray
-> 5788         z[0] = numpy.asarray(numpy.dot(x, y))
   5789 
   5790     def grad(self, inp, grads):

ValueError: shapes (2,2) and (4,2) not aligned: 2 (dim 1) != 4 (dim 0)

I'll try to look into this in the next few days.

schlichtanders commented 7 years ago

this is expected kind of - you have to use my branch, or simply pull the recent master as it was just merged

On Thu, Feb 23, 2017 at 1:59 PM Austin Rochford notifications@github.com wrote:

I get the following error when trying to define that model

AttributeError Traceback (most recent call last) /home/jovyan/pymc3/pymc3/distributions/mixture.py in _comp_logp(self, value) 82 ---> 83 return compdists.logp(value) 84 except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in () 4 beta = pm.Beta('beta', 1, alpha, shape=K) 5 w = pm.Deterministic('w', stick_breaking(beta)) ----> 6 obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data) /home/jovyan/pymc3/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs) 32 data = kwargs.pop('observed', None) 33 total_size = kwargs.pop('total_size', None) ---> 34 dist = cls.dist(*args, **kwargs) 35 return model.Var(name, dist, data, total_size) 36 else: /home/jovyan/pymc3/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs) 43 def dist(cls, *args, **kwargs): 44 dist = object.__new__(cls) ---> 45 dist.__init__(*args, **kwargs) 46 return dist 47 /home/jovyan/pymc3/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs) 64 # for logp the different modes are like different observations, i.e. rows, hence use comp_modes.T 65 # logPs is a vector, hence self.logp(..).T == self.logp(..) ---> 66 comp_mode_logps = self.logp(comp_modes.T) 67 self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)] 68 /home/jovyan/pymc3/pymc3/distributions/mixture.py in logp(self, value) 112 w = self.w 113 --> 114 return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(), 115 w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1)) 116 /home/jovyan/pymc3/pymc3/distributions/mixture.py in _comp_logp(self, value) 83 return comp_dists.logp(value_) 84 except AttributeError: ---> 85 return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists], 86 axis=1) 87 /home/jovyan/pymc3/pymc3/distributions/mixture.py in (.0) 83 return comp_dists.logp(value_) 84 except AttributeError: ---> 85 return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists], 86 axis=1) 87 /home/jovyan/pymc3/pymc3/distributions/multivariate.py in logp(self, value) 130 else: 131 result -= logdet(tau) --> 132 result += (tt.dot(tau, delta) * delta).sum(axis=delta.ndim - 1) 133 return -1 / 2. * result 134 /opt/conda/lib/python3.5/site-packages/theano/tensor/basic.py in dot(a, b) 5968 return tensordot(a, b, [[a.ndim - 1], [numpy.maximum(0, b.ndim - 2)]]) 5969 else: -> 5970 return _dot(a, b) 5971 5972 /opt/conda/lib/python3.5/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs) 661 thunk.outputs = [storage_map[v] for v in node.outputs] 662 --> 663 required = thunk() 664 assert not required # We provided all inputs 665 /opt/conda/lib/python3.5/site-packages/theano/gof/op.py in rval(p, i, o, n) 859 # default arguments are stored in the closure of `rval` 860 def rval(p=p, i=node_input_storage, o=node_output_storage, n=node): --> 861 r = p(n, [x[0] for x in i], o) 862 for o in node.outputs: 863 compute_map[o][0] = True /opt/conda/lib/python3.5/site-packages/theano/tensor/basic.py in perform(self, node, inp, out) 5786 # gives a numpy float object but we need to return a 0d 5787 # ndarray -> 5788 z[0] = numpy.asarray(numpy.dot(x, y)) 5789 5790 def grad(self, inp, grads): ValueError: shapes (2,2) and (4,2) not aligned: 2 (dim 1) != 4 (dim 0) I'll try to look into this in the next few days. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub , or mute the thread .

AustinRochford commented 7 years ago

I did install the most recent master (I think) but I'll double check this evening.

schlichtanders commented 7 years ago

thanks for looking into this

AustinRochford commented 7 years ago

Confirmed on the most recent master (7e4cc661be2f14d3ced399e777bc6cb134b5c2bf) that I get the same error I posted above.

schlichtanders commented 7 years ago

sorry, I could have looked into the stacktrace, and in fact my little change was in there already. The error you had was impressively similar to the error I already fixed. I think the error didn't occurred to me because I had cuda blas library activated. After deactivating it I got the same error.

I fixed it in #1821, however the distribution is still constant for me

schlichtanders commented 7 years ago

The new little pull request was just merged into master @AustinRochford can you pull and try again?

AustinRochford commented 7 years ago

I'll give it a shot tonight

AustinRochford commented 7 years ago

With #1821 I can now reproduce.

AustinRochford commented 7 years ago

By simplifying this from a Dirichlet process mixture (this is unlikely to converge with K = 4, (generally we need to greatly overestimate the number of components to ensure convergence) to a plain mixture model with K = 3 components, I can at least get the chains to mix a bit. See https://gist.github.com/AustinRochford/e45b7e5f7d18d77f391d3373801e4c9a

We should probably work to improve the convergence speed for this simpler example, then step up to the more complex Dirichlet process mixture.

AustinRochford commented 7 years ago

This also seems to work with a Metropolis step. https://gist.github.com/AustinRochford/41109579c8be23a10e2cd167209dbe25

shaneh2 commented 6 years ago

This also seems to work with a Metropolis step. https://gist.github.com/AustinRochford/41109579c8be23a10e2cd167209dbe25

When I try the code from @AustinRochford's last comment on my computer, I get this error:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    109 
--> 110             return comp_dists.logp(value_)
    111         except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-8-374c618ef5c0> in <module>()
      5     #w = pm.Deterministic('w', stick_breaking(beta))
      6     w = pm.Dirichlet('w', np.ones(K))
----> 7     obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     34                 raise TypeError("observed needs to be data but got: {}".format(type(data)))
     35             total_size = kwargs.pop('total_size', None)
---> 36             dist = cls.dist(*args, **kwargs)
     37             return model.Var(name, dist, data, total_size)
     38         else:

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
     45     def dist(cls, *args, **kwargs):
     46         dist = object.__new__(cls)
---> 47         dist.__init__(*args, **kwargs)
     48         return dist
     49 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs)
     91         try:
     92             comp_modes = self._comp_modes()
---> 93             comp_mode_logps = self.logp(comp_modes)
     94             self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)]
     95 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in logp(self, value)
    139         w = self.w
    140 
--> 141         return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
    142                      w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1),
    143                      broadcast_conditions=False)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in <listcomp>(.0)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in logp(self, value)
    268 
    269     def logp(self, value):
--> 270         quaddist, logdet, ok = self._quaddist(value)
    271         k = value.shape[-1].astype(theano.config.floatX)
    272         norm = - 0.5 * k * pm.floatX(np.log(2 * np.pi))

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in _quaddist(self, value)
     87             onedim = False
     88 
---> 89         delta = value - mu
     90 
     91         if self._cov_type == 'cov':

/usr/local/lib/python3.6/site-packages/theano/tensor/var.py in __sub__(self, other)
    145         # and the return value in that case
    146         try:
--> 147             return theano.tensor.basic.sub(self, other)
    148         except (NotImplementedError, AsTensorError):
    149             return NotImplemented

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in rval()
    860 
    861         def rval():
--> 862             thunk()
    863             for o in node.outputs:
    864                 compute_map[o][0] = True

/usr/local/lib/python3.6/site-packages/theano/gof/cc.py in __call__(self)
   1733                 print(self.error_storage, file=sys.stderr)
   1734                 raise
-> 1735             reraise(exc_type, exc_value, exc_trace)
   1736 
   1737 

/usr/local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 2)

It seems that the way dimensions are handled has changed in the past year? It seems that now, it expects the inputted data to be the same dimension as the number of mixture components (!)

I am trying to create a multivariate GMM too so any help would be greatly appreciated.

Environment details: OS: MacOS High Sierra 10.13.2 (17C205) Python: 3.6.3 theano: 1.0.1 pymc3: 3.3

junpenglao commented 6 years ago

You can try specifying the shape: pm.MvNormal.dist(mu, cov) --> pm.MvNormal.dist(mu, cov, shape=component_shape).

shaneh2 commented 6 years ago

@junpenglao Thank-you for your help.

I have made the changes to the code you suggest, and I still get the same error. Here is the snippet of code I have changed from the notebook https://gist.github.com/AustinRochford/41109579c8be23a10e2cd167209dbe25 from @AustinRochford .. I have tried adding shape=data.shape to where both the individual mixture distributions and also the overall mixture are defined.

def multivariatenormal(init_mean, init_sigma, init_corr, suffix="", dist=False):
    if not isinstance(suffix, str):
        suffix = str(suffix)
    D = len(init_sigma)

    sigma = pm.Lognormal('sigma' + suffix, np.zeros(D, dtype=np.float), np.ones(D), shape=D, testval=init_sigma)
    nu = pm.Uniform('nu' + suffix, 0, 5)
    C_triu = pm.LKJCorr('C_triu' + suffix, nu, D, testval=init_corr)
    cov = pm.Deterministic('cov' + suffix, make_cov_matrix(sigma, C_triu, module=tt))

    mu = pm.MvNormal('mu' + suffix, 0, cov, shape=2, testval=init_mean)

    return pm.MvNormal.dist(mu, cov) if dist else pm.MvNormal('mvn' + suffix, mu, cov, shape=data.shape)

def stick_breaking(beta):
    portion_remaining = tt.concatenate([[1], tt.extra_ops.cumprod(1 - beta)[:-1]])
    return beta * portion_remaining

K = 3
with pm.Model() as model:
    #alpha = pm.Gamma('alpha', 1., 1.)
    #beta = pm.Beta('beta', 1, alpha, shape=K)
    #w = pm.Deterministic('w', stick_breaking(beta))
    w = pm.Dirichlet('w', np.ones(K))
    obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data, shape=data.shape)

And here I basically get the same error:

AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    109 
--> 110             return comp_dists.logp(value_)
    111         except AttributeError:

AttributeError: 'list' object has no attribute 'logp'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-24-9772e61f2786> in <module>()
      5     #w = pm.Deterministic('w', stick_breaking(beta))
      6     w = pm.Dirichlet('w', np.ones(K))
----> 7     obs = pm.Mixture('obs', w, [multivariatenormal(init_mean, init_sigma, init_corr, k, dist=True) for k in range(K)], observed=data, shape=data.shape)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     34                 raise TypeError("observed needs to be data but got: {}".format(type(data)))
     35             total_size = kwargs.pop('total_size', None)
---> 36             dist = cls.dist(*args, **kwargs)
     37             return model.Var(name, dist, data, total_size)
     38         else:

/usr/local/lib/python3.6/site-packages/pymc3/distributions/distribution.py in dist(cls, *args, **kwargs)
     45     def dist(cls, *args, **kwargs):
     46         dist = object.__new__(cls)
---> 47         dist.__init__(*args, **kwargs)
     48         return dist
     49 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in __init__(self, w, comp_dists, *args, **kwargs)
     91         try:
     92             comp_modes = self._comp_modes()
---> 93             comp_mode_logps = self.logp(comp_modes)
     94             self.mode = comp_modes[tt.argmax(w * comp_mode_logps, axis=-1)]
     95 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in logp(self, value)
    139         w = self.w
    140 
--> 141         return bound(logsumexp(tt.log(w) + self._comp_logp(value), axis=-1).sum(),
    142                      w >= 0, w <= 1, tt.allclose(w.sum(axis=-1), 1),
    143                      broadcast_conditions=False)

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in _comp_logp(self, value)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/mixture.py in <listcomp>(.0)
    110             return comp_dists.logp(value_)
    111         except AttributeError:
--> 112             return tt.stack([comp_dist.logp(value) for comp_dist in comp_dists],
    113                             axis=1)
    114 

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in logp(self, value)
    268 
    269     def logp(self, value):
--> 270         quaddist, logdet, ok = self._quaddist(value)
    271         k = value.shape[-1].astype(theano.config.floatX)
    272         norm = - 0.5 * k * pm.floatX(np.log(2 * np.pi))

/usr/local/lib/python3.6/site-packages/pymc3/distributions/multivariate.py in _quaddist(self, value)
     87             onedim = False
     88 
---> 89         delta = value - mu
     90 
     91         if self._cov_type == 'cov':

/usr/local/lib/python3.6/site-packages/theano/tensor/var.py in __sub__(self, other)
    145         # and the return value in that case
    146         try:
--> 147             return theano.tensor.basic.sub(self, other)
    148         except (NotImplementedError, AsTensorError):
    149             return NotImplemented

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    672                 thunk.outputs = [storage_map[v] for v in node.outputs]
    673 
--> 674                 required = thunk()
    675                 assert not required  # We provided all inputs
    676 

/usr/local/lib/python3.6/site-packages/theano/gof/op.py in rval()
    860 
    861         def rval():
--> 862             thunk()
    863             for o in node.outputs:
    864                 compute_map[o][0] = True

/usr/local/lib/python3.6/site-packages/theano/gof/cc.py in __call__(self)
   1733                 print(self.error_storage, file=sys.stderr)
   1734                 raise
-> 1735             reraise(exc_type, exc_value, exc_trace)
   1736 
   1737 

/usr/local/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 2)

Any help would be greatly appreciated.

junpenglao commented 6 years ago

could you please open a post on https://discourse.pymc.io?

shaneh2 commented 6 years ago

@junpenglao Thank-you for the suggestion, I have just done this here: https://discourse.pymc.io/t/problem-with-fitting-multivariate-mixture-of-gaussians/846

pymc-devs / pymc

Multivariate Gaussian Mixture Model using pm.Mixture not working #1811