Closed fonnesbeck closed 8 years ago
Probably the problems occur because of the forking of the process as reported here: https://pythonhosted.org/joblib/parallel.html Bad interaction of multiprocessing and third-party libraries¶ One solution could be to use spawning instead for python 3.4 and above. However I am using 2.7 so there we would need another solution. One is suggested here https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs for forking GPU processes. Propably this could be used for CPUs as well?
Thanks for the info. I've tried setting JOBLIB_START_METHOD='forkserver'
which works in the sense of preventing a crash, but I start to see other errors:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/pool.py", line 108, in worker
task = get()
File "/Users/fonnescj/anaconda3/lib/python3.5/site-packages/joblib/pool.py", line 359, in get
return recv()
File "/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
File "/Users/fonnescj/Repositories/pymc3/pymc3/distributions/distribution.py", line 18, in __new__
raise TypeError("No model on context stack, which is needed to "
TypeError: No model on context stack, which is needed to use the Normal('x', 0,1) syntax. Add a 'with model:' block
Process ForkServerPoolWorker-3:
Traceback (most recent call last):
File "/Users/fonnescj/Repositories/pymc3/pymc3/model.py", line 113, in get_context
return cls.get_contexts()[-1]
IndexError: list index out of range
I will read up on Theano's strategy as well. We really ought to get GPU multiprocessing going, however. Seems like low-hanging fruit.
One other thing that I noticed at least with the Text backend, which is a problem. backend = Text(name,model) initialises the backend object with the target folder in backend.name and the respective .csv file path in backend.filename (after backend.setup) where .df contains then the sampled trace values, after running sample like: trace = sample(draws,step,...) Now the BIG BIG BUT: The returned "trace" is still referring to the backend instance. With .df and .filename. If you do backend.df=None your trace.df will be None as well. Thats ok if you just run one chain. But if you run several chains especially when doing it serial each MultiTrace object is related to backend, because backend.setup(draws, chainnumber) only opens the csv file on disk but does not copy the backend basetrace object. So before each "sample" you need to reinitiallise the backend, instead of only doing repeated times backend.setup which is being done in _sample. Somehow in backend.setup the whole object needs to be copied. But I have no idea how to do this.
I just noticed it, because I wanted to run "sample" several times in a loop and collect all the traces in a list. Turns out all the traces had the same values. But on disk they are properly written. Then when loading the traces from disk the traces are properly loaded into several objects.
I have no idea about the other backends sqlite and ndarray. Apparently thats the issue with sqlite as well: https://github.com/pymc-devs/pymc3/issues/1008
I just updated to the latest Theano 0.8 and pymc3 and this problem has disappeared for me. Strange thing though, while I build python manually with setup.py install, it still complained that it wanted Theano 0.7. The install seemed to go ok though.
yes for me it also wants to install theano 0.7 although I have the dev version thats somehow anoying, I simply disabled it in the setup script, although there must be a nice way.
It's trying to pull 0.7 when you run pymc3's setup.py?
Yes it does.
Yes, it seemed to install fine and use Theano 0.8, but it was rather confusing.
I have to abort it because when I let it install it, my import uses the 0.7 version instead of the dev version. They made sooooo many improvements in the current dev version so it is really significant to use the dev version.
Ah great thx!
Fixed it. thanks!
Is it time to shut this?
I haven't done extensive testing, but on some high dimensional problems that originally threw the recursion error, the problem has disappeared. So perhaps for now it is solved. :)
That sounds amazing. I'll close it but feel free to reopen if the problem persists with master pymc3 and theano.
Thanks for the recent bugfixes guys, also the updates to the build dependencies mean I'm now running theano: 0.8.0rc1
and either or both changes seem to have increased the theshold at which I was finding recursion errors.
EDIT: Okay, well - that does seem to have fixed it. I think I have a different bug though:
njobs > 1
, the processes start (I'm viewing in htop
) and then they die without throwing an errorI assume the difference in 2 is that the model is already cached. It's tricky to replicate though, a bit of a Heisenbug!
I also still get my segmentation faults - also with creating all the Text backends in advance...
Oh! Really. Even with the latest pymc3 version, I am getting the same error with njobs=2.
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<MultiTrace: 1 chains, 10 iterations, 2106 variables>]'. Reason: 'RuntimeError('maximum recursion depth exceeded',)'
trace = pm.sample(n_samples, step=step_func, start=start, njobs=n_chains, progressbar=False)
File "/home/user/.local/lib/python2.7/site-packages/pymc3/sampling.py", line 150, in sample
return sample_func(**sample_args)
File "/home/user/.local/lib/python2.7/site-packages/pymc3/sampling.py", line 282, in _mp_sample
**kwargs) for i in range(njobs))
File "/home/user/.local/lib/python2.7/site-packages/joblib/parallel.py", line 810, in __call__
self.retrieve()
File "/home/user/.local/lib/python2.7/site-packages/joblib/parallel.py", line 727, in retrieve
self._output.extend(job.get())
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<MultiTrace: 1 chains, 10 iterations, 2106 variables>]'. Reason: 'RuntimeError('maximum recursion depth exceeded',)'
I have pymc3-3.0, numpy-1.11.0, Theano-0.8.1, scipy-0.17.0 installed. Anyone else facing the same issue in the latest version of pymc3?
By "latest pymc3 version" do you mean that you installed it from GitHub master? That is,
pip install -U git+https://github.com/pymc-devs/pymc3.git
I installed using
pip install --process-dependency-links git+https://github.com/pymc-devs/pymc3
Make sure you use the -U
flag or it may not update. I have not had this error since we closed this issue, so my first guess is that your update did not stick.
Oh.. Thank you so much for your quick response. I'll update using -U
flag and will get back. Thanks again!
Sorry @fonnesbeck , installing pymc3 with -U also leads to same error.
Even I removed all the packages(pymc3, numpy, scipy, theano) from my machine and tried fresh installation of pymc3 using pip install -U git+https://github.com/pymc-devs/pymc3.git
. It also ended up in RuntimeError('maximum recursion depth exceeded',).
I have, Python 2.7.6, pymc3.0, matplotlib-1.5.1, joblib-0.9.4, numpy-1.11.0, pandas-0.18.0, patsy-0.4.1, pydot_ng-1.0.0, pyparsing-2.1.1,scipy-0.17.0, Theano-0.8.1 installed in my machine.
nvidia-smi
toolkit gives following details,
NVIDIA-SMI 346.96, Driver Version: 346.96, 4 GPU(0,1,2,3).
My .theanorc config is,
[global]
device = gpu
floatX = float32
assert_no_cpu_op = warn
[cuda]
root = /usr/local/cuda
[nvcc]
fastmath = True
[pycuda]
init = True
Is there anything else to be done?
Perhaps the GPU utilization is at fault? Have you tried with CPU?
On Thu, Apr 21, 2016 at 9:04 AM, Vivek Harikrishnan Ramalingam < notifications@github.com> wrote:
Sorry @fonnesbeck https://github.com/fonnesbeck , installing pymc3 with -U also leads to same error. Even I removed all the packages(pymc3, numpy, scipy, theano) from my machine and tried fresh installation of pymc3 using pip install -U git+ https://github.com/pymc-devs/pymc3.git. It also ended up in RuntimeError('maximum recursion depth exceeded',).
I have, Python 2.7.6, pymc3.0, matplotlib-1.5.1, joblib-0.9.4, numpy-1.11.0, pandas-0.18.0, patsy-0.4.1, pydot_ng-1.0.0, pyparsing-2.1.1,scipy-0.17.0, Theano-0.8.1 installed in my machine.
nvidia-smi toolkit gives following details, NVIDIA-SMI 346.96, Driver Version: 346.96, 4 GPU(0,1,2,3).
My .theanorc config is,
[global] device = gpu floatX = float32 assert_no_cpu_op = warn [cuda] root = /usr/local/cuda [nvcc] fastmath = True [pycuda] init = True
Is there anything else to be done?
— You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/879#issuecomment-212774514
Thanks @twiecki I will try with CPU and post my updates.
device=cpu
in .theanorc also raises RuntimeError('maximum recursion depth exceeded',).
Below snippet is I am trying to execute.
import pymc3 as pm
import theano.tensor as T
import pandas
def tinvlogit(x):
return T.exp(x) / (1 + T.exp(x))
pandas_df = pandas.read_csv("data.csv")
x_col1 = pandas_df['col1']
x_col2 = pandas_df['col2']
x_col3 = pandas_df['col3']
n_col3 = len(pandas_df['col3'].unique())
with pm.Model() as model:
b_0 = pm.Normal('b_0', mu=0, sd=100)
b_col1 = pm.Normal('b_col1', mu=0, sd=100)
b_col2 = pm.Normal('b_col2', mu=0, sd=100)
sigma_col3 = pm.HalfNormal('sigma_col3', sd=100)
b_col3 = pm.Normal('b_col3', mu=0, sd=sigma_col3, shape=n_col3)
for i in range(0, len(pandas_df)):
p = pm.Deterministic('p', T.maximum(0, T.minimum(1, tinvlogit(
b_0 + b_col1 * x_col1.at[i] + b_col2 * x_col2.at[i] + b_col3[x_col3.at[i]))))
y = pm.Bernoulli('y', p, observed=pandas_df.y)
start = pm.find_MAP()
step_func = pm.NUTS()
trace = pm.sample(5000, step=step_func, start=start, njobs=2, progressbar=True)
pm.sample
fails with RuntimeError('maximum recursion depth exceeded')
pandas_df
is pandas dataframe with columns col1(decimal), col2(decimal), col3(integer between 1-10), y(0 or 1) and has 50000 rows.
You get the recursion error because your graph will be very long as your loop will be running for 50k times, each time with all the nodes. Although I dont really get the purpose of your model I have the feeling you could vectorize it and get rid of the loop. The RVs have a shape parameter where you can simply create vectors of length of your data frame. The way you do it now p will be always overwritten and only the last sample of your dataframe will go into the cost. Or am I missing something?
Setting the
njobs
parameter to run multiple chains results in an error: