pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.67k stars 2k forks source link

Kernel restarting running Ipython + PyMC #631

Closed ghost closed 9 years ago

ghost commented 9 years ago

I posted this question on Stack Overflow a few days ago. @fonnesbeck ran my code but it worked without issues. Since then I tried to reproduce the problem in a fresh Ubuntu installation and I encountered several error messages and updated my question accordingly.

In short, this is the code:

from sklearn.datasets import load_boston
import numpy as np
import pymc as pm
import pandas as pd

boston = load_boston()
features = ['INDUS', 'NOX', 'RM', 'TAX', 'PTRATIO', 'LSTAT']
df = pd.DataFrame(boston.data, columns=boston.feature_names)
X = np.array(df.ix[:, features])
y = boston.target

gamma = pm.Binomial('gamma', 1, 0.5, size=len(features))
var = pm.Lambda('var', lambda gamma=gamma: (1-gamma)*0.001 + gamma*10)
prec = pm.Lambda('prec', lambda var=var: 1.0/var)
b = pm.Normal('b', 0, prec)
int_ = pm.Normal('int_', 0, 0.01)
taue = pm.Gamma('taue', 0.1, 0.1)
mu = int_ + X[:,0]*b[0] + X[:,1]*b[1] + X[:,2]*b[2] + X[:,3]*b[3] + X[:,4]*b[4] + X[:,5]*b[5]
observed = pm.Normal('obs', mu, taue, observed=True, value=y)
M = pm.MCMC([observed, mu, int_, b, prec, var, gamma])
M.sample(10000, 500, 5)

pm.Matplot.plot(M)

and I get `*** Error in "/home/ubuntu/anaconda/envs/env3/bin/python": double free or corruption (out): 0x00000000023dc940 *** Aborted (core dumped) when plotting or Segmentation fault (core dumped) when sampling. This happens most of the time although occasionally, the code finishes correctly.

Other models run without problems. I'm using Ipython 2.3.0, pymc 2.3.3, python 2.7.8 in an anaconda environment. I also happened in Ipython 3.0.0-dev.

Does anyone know what is happening?

fonnesbeck commented 9 years ago

I can't replicate this on OS X. Perhaps a Linux user can give it a go.

ghost commented 9 years ago

Thanks again for trying to replicate this.

I tried to run the code again today (although using my computer) and I haven't been able to run it successfully in the Ipython Notebook. The Ipython interpreter didn't throw erros until I used %run model (which failed repeatedly) instead of pasting the code line by line. After that, I got lots of [1] 6655 segmentation fault (core dumped) ipython after plotting and now even python (without ipython) is throwing errors.

Maybe there is something wrong with my installation, although I'm not sure why similar issues arose in the EC2 machine. Furthermore, I haven't seen these issues with other models.

twiecki commented 9 years ago

I can replicate this with pymc 2.3.3 and 2.3.4:

gamma = pm.Binomial('gamma', 1, 0.5, size=len(features))
*** Error in `/home/wiecki/envs/hddm/bin/python': free(): corrupted unsorted chunks: 0x000000000415cff0 ***
fonnesbeck commented 9 years ago

OK, I can replicate this on OS X when I run it in an IPython notebook and create inline plots (%matplotlib inline). Here is the crash report. MPL issue? I guess its worth looking at the values that its trying to plot.

twiecki commented 9 years ago

I even get this before any matplotlib as you can see above. That would suggest an issue with pm.Binomial?

fonnesbeck commented 9 years ago

I notice that you do not pass taue to the sampler. Always best to use locals() or vars() to ensure all variables are passed, or better, encapsulate model with a function.

fonnesbeck commented 9 years ago

It happens stochastically for me, and when it does, its always during plotting. Here it is just at the point of crashing when plotting gamma. I've successfully printed out the summary statistics of the node just above, so its not in the calculation of posterior stats.

fonnesbeck commented 9 years ago

I can run the model repeatedly without any crashes if I don't do anything with the samples. For me its always manipulating the samples that causes the crash. I have tried other backends (txt, sqlite), but the behavior persists.

ghost commented 9 years ago

@fonnesbeck Yes, I forgot to add taue. I will try your suggestions the next time.

Usually, the error occurs when plotting. However, it also occurs in gamma = pm.Binomial('gamma', 1, 0.5, size=len(features)) as twiecki describes. Infrequently, it crashes in mu = int_ + X[:,0]*b[0] + X[:,1]*b[1] + X[:,2]*b[2] + X[:,3]*b[3] + X[:,4]*b[4] + X[:,5]*b[5] and even when sampling M.sample(10000, 500, 5) as I posted in Stack Overflow

Would it be possible that forcing a Bernoulli distribution by using pm.Binomial('gamma', 1, 0.5, size=len(features)) is exposing some bug?

fonnesbeck commented 9 years ago

Can you replicate the bug using pm.Bernoulli? Its not clear how a bug in Binomial would manifest itself at the plotting stage, but who knows?

ghost commented 9 years ago

Sure. This is the simplest model that causes some issues:

import pymc as pm
import numpy as np

np.random.seed(42)
nrows = 1000
X = np.random.random((nrows,4))
y = np.random.random(nrows) * 10

gamma = pm.Binomial('gamma', 1, 0.5, size=X.shape[1])
mu = gamma[0]*X[:,0] + gamma[1]*X[:,1] + gamma[2]*X[:,2] + gamma[3]*X[:,3]
observed = pm.Normal('obs',mu, 1, observed=True, value=y)
M = pm.MCMC([observed, mu, gamma])
M.sample(10000, 500, 5)

pm.Matplot.plot(M)

As I increased the number of variables considered in mu, it was more likely to fail in the Ipython Notebook. However, I wasn't able to make it fail using vanilla Ipython, but if I paste all the lines again, I get this error:

In [24]: pm.Matplot.plot(M)
Plotting gamma_0
Plotting gamma_1
Plotting gamma_2
[1]    5500 segmentation fault (core dumped)  ipython

By the way, there are some warnings that didn't appear previously:

In [12]: pm.Matplot.plot(M)
Plotting gamma_0
/home/user/anaconda/envs/scientific/lib/python2.7/site-packages/matplotlib/axes/_base.py:2791: UserWarning: Attempting to set identical bottom==top results
in singular transformations; automatically expanding.
bottom=1, top=1
  'bottom=%s, top=%s') % (bottom, top))
/home/user/anaconda/envs/scientific/lib/python2.7/site-packages/matplotlib/axes/_base.py:2544: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=1, right=1
  'left=%s, right=%s') % (left, right))
Plotting gamma_1
Plotting gamma_2
Plotting gamma_3
fonnesbeck commented 9 years ago

I pushed a fix to the 2.3 branch last week that should fix this. Can you try building from source and seeing if the bug persists?

ghost commented 9 years ago

I already tested it. It works flawlessly, including the previous example using the boston dataset.

Thanks!