pymc-devs / pymc2

THIS IS THE **OLD** PYMC PROJECT (VERSION 2). PLEASE USE PYMC INSTEAD:
http://pymc-devs.github.com/pymc/
Other
878 stars 228 forks source link

Memory issue #5

Open fonnesbeck opened 9 years ago

fonnesbeck commented 9 years ago

Moved from pymc-devs/pymc3#543

Connecting a single Stochastic variable to a large number of other Stochastic variables takes a lot of memory. E.g.

def create_model(i, a):
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

a = pymc.Uniform('a', lower=0., upper=100., value=.1)
l = [create_model(i, a) for i in range(10000)]
model = pymc.Model(l)

while having twice as much not connected variables is fine :

def create_model(i):
    a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
    b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
    return locals()

l = [create_model(i) for i in range(10000)]
model = pymc.Model(l)
fonnesbeck commented 9 years ago

Doing a little digging using the memory profiler, first for the "connected" model:

$ python -m memory_profiler connected.py
Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
    10  179.062 MiB    0.000 MiB       l = [create_model(i, a) for i in range(1000)]

Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   82.871 MiB    0.000 MiB   @profile
     4                             def main():
     5   82.871 MiB    0.000 MiB       def create_model(i, a):
     6                                     b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     7                                     return locals()
     8                             
     9   82.949 MiB    0.078 MiB       a = pymc.Uniform('a', lower=0., upper=100., value=.1)
    10  179.062 MiB   96.113 MiB       l = [create_model(i, a) for i in range(1000)]
    11  247.961 MiB   68.898 MiB       model = pymc.Model(l)

Filename: connected.py

Line #    Mem usage    Increment   Line Contents
================================================
     5  178.930 MiB    0.000 MiB       def create_model(i, a):
     6  179.062 MiB    0.133 MiB           b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     7  179.062 MiB    0.000 MiB           return locals()

and for the "unconnected"

$ python -m memory_profiler unconnected.py
Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   82.832 MiB    0.000 MiB   @profile
     4                             def main():
     5   82.832 MiB    0.000 MiB       def create_model(i):
     6                                     a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
     7                                     b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     8                                     return locals()
     9                             
    10  108.156 MiB   25.324 MiB       l = [create_model(i) for i in range(1000)]
    11  115.336 MiB    7.180 MiB       model = pymc.Model(l)

Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
    10  108.156 MiB    0.000 MiB       l = [create_model(i) for i in range(1000)]

Filename: unconnected.py

Line #    Mem usage    Increment   Line Contents
================================================
     5  108.129 MiB    0.000 MiB       def create_model(i):
     6  108.141 MiB    0.012 MiB           a = pymc.Uniform('a_%i' % i, lower=0., upper=100., value=.1)
     7  108.156 MiB    0.016 MiB           b = pymc.Normal('b_%i' % i, mu=a, tau=1.)
     8  108.156 MiB    0.000 MiB           return locals()

I have also confirmed that the connected model is not somehow creating new PyMC objects (at least as far as I can tell), and that the size of the individual variables in each model is identical, via sys.getsizeof(model.variables.pop()).

So, this is still a mystery. Need to do deeper profiling, I suppose.

abojchevski commented 8 years ago

Any information on the memory issue?

The following relatively simple (stochastic block) model has tremendous memory usage even for small number of samples (e.g. n=100).

Or am I doing something wrong in the model definition?

import numpy as np
import pymc as pm

# generate random block matrix
n = 50
A11 = np.random.rand(n, n) > 0.3
A12 = np.random.rand(n, n) > 0.9
A21 = np.random.rand(n, n) > 0.9
A22 = np.random.rand(n, n) > 0.3

A_obs = np.bmat([[A11, A12], [A21, A22]])

N = A_obs.shape[0]
K = 2

# define model
pi = pm.Dirichlet('pi', theta=0.5 * np.ones(K))
eta = pm.Container([[pm.Beta('b_{}{}'.format(i, j), alpha=1, beta=1) for i in range(K)] for j in range(K)])

q = pm.Container([pm.Categorical('q_{}'.format(i), p=pi) for i in range(N)])

A = pm.Container([[pm.Bernoulli('A_{}_{}'.format(i, j),
                                p=pm.Lambda('A_lambda_{}_{}'.format(i, j),
                                            lambda qi=q[i], qj=q[j], eta=eta: eta[qi][qj]),
                                value=A_obs[i, j], observed=True) for i in range(N)] for j in range(N)])

# sample
mcmc = pm.MCMC([A, q, pi, eta])
trace = mcmc.sample(200)

print(np.array(q.value))