pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.67k stars 2k forks source link

Function over array of r.v. generating memory leak and performance problems #2044

Closed marciomacielbastos closed 7 years ago

marciomacielbastos commented 7 years ago

Hi all, I am new with pymc3 and I am having a issue with memory and process time with the follow code:

def set_I(beta, gamma, I):
    N = pm.Uniform("N", lower=5e6, upper=5.2e6)
    dI = []
    for i in range(len(observed_array)):
        dI.append(beta*I[i]*(N-I[i])/(N)-gamma*I[i])
    return tt.stack(dI)

with pm.Model() as model:
    # Priors for unknown model parameters
    beta = pm.Uniform("beta", lower=0.5, upper=1.5)
    gamma = pm.Poisson("gamma", mu=0.5)
    I = pm.Uniform("I", lower=0, upper=5e6, shape=len(observed_array))

    # Deterministics (parameters which are function of previous parameters)
    mu = pm.Deterministic("mu", set_I(beta, gamma, I))

    # Likelihood (sampling distribution) of observations
    dI = pm.Poisson("dI", mu=mu, observed=observed_array)

with model:
    start = pm.find_MAP() # Find starting value by optimization

with model:
    step1 = pm.NUTS(scaling=start) # Instantiate MCMC sampling algorithm
    step2 = pm.Metropolis()
    trace = pm.sample(1000, step=[step1, step2], tune=500, start=start, njobs=3)

The previous code consumes all available memory before finish MAP or starting the sampling (in the case where I have no start statement). Thank you all!

junpenglao commented 7 years ago

There are a few tips you might find helpful: 1, starting using pm.find_MAP() is usually a bad idea in high-dimensional case. You should first try to use the default, which initiated using ADVI (something like trace = pm.sample(3e3, njobs=2, tune=1000))

2, using a list then convert to theano tensor might create unnecessary node in your graph, which in turn slows down your compiling and sampling. Try to use matrix multiplication instead. For example, instead of

def set_I(beta, gamma, I):
    N = pm.Uniform("N", lower=5e6, upper=5.2e6)
    dI = []
    for i in range(len(observed_array)):
        dI.append(beta*I[i]*(N-I[i])/(N)-gamma*I[i])
    return tt.stack(dI)

try just

N = pm.Uniform("N", lower=5e6, upper=5.2e6)
mu = pm.Deterministic("mu", beta*I*(N-I)/N-gamma*I) # careful of the shape broadcasting here

3, Is there a particular reason you are N and I is at 1e6 scale? You should consider to reparameterized your model the parameters are at a similar scale.

marciomacielbastos commented 7 years ago

Thank you @junpenglao! I will try to follow your instructions and return you a feedback as son as possible!

marciomacielbastos commented 7 years ago

Perfect @junpenglao! Worked fast and economic! Now do you mind to explain me why to use theano tensor would add nodes to my graph?

junpenglao commented 7 years ago

Happy to help! Glad it works ;-)

Now do you mind to explain me why to use theano tensor would add nodes to my graph?

Maybe @twiecki can give a better explanation than me. From my understanding, if you store your theano tensor or pymc3 variable in a list and then converted to a new variable, each element is considered as a separate node.

fonnesbeck commented 7 years ago

That looks like an infectious disease dynamics model. Is it?

marciomacielbastos commented 7 years ago

@fonnesbeck, yes it is!

fonnesbeck commented 7 years ago

@marciomacielbastos if gamma is the recovery rate, it should not be Poisson (which typically models counts). Better as something continuous, like gamma or lognormal. Moreover, this would allow you to use NUTS for the whole model.

marciomacielbastos commented 7 years ago

@fonnesbeck, thank you very much! I will consider to change it! Do you work with epidemiology?

fonnesbeck commented 7 years ago

@marciomacielbastos yes, I do.

I'm going to go ahead and close this, as it looks like @junpenglao 's suggestion was effective.