Running multiple chains causes RecursionError

fonnesbeck commented 8 years ago

Setting the njobs parameter to run multiple chains results in an error:

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-59-548e16bedce3> in <module>()
      6 
      7 
----> 8     trace = sample(5000, njobs=2)

/Users/fonnescj/Github/pymc3/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    153         sample_args = [draws, step, start, trace, chain,
    154                        tune, progressbar, model, random_seed]
--> 155     return sample_func(*sample_args)
    156 
    157 

/Users/fonnescj/Github/pymc3/pymc3/sampling.py in _mp_sample(njobs, args)
    274 def _mp_sample(njobs, args):
    275     p = mp.Pool(njobs)
--> 276     traces = p.map(argsample, args)
    277     p.close()
    278     return merge_traces(traces)

/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/pool.py in map(self, func, iterable, chunksize)
    258         in a list that is returned.
    259         '''
--> 260         return self._map_async(func, iterable, mapstar, chunksize).get()
    261 
    262     def starmap(self, func, iterable, chunksize=None):

/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/pool.py in get(self, timeout)
    606             return self._value
    607         else:
--> 608             raise self._value
    609 
    610     def _set(self, i, obj):

/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    383                         break
    384                     try:
--> 385                         put(task)
    386                     except Exception as e:
    387                         job, ind = task[:2]

/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

/Users/fonnescj/anaconda3/lib/python3.5/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     48     def dumps(cls, obj, protocol=None):
     49         buf = io.BytesIO()
---> 50         cls(buf, protocol).dump(obj)
     51         return buf.getbuffer()
     52 

RecursionError: maximum recursion depth exceeded

twiecki commented 8 years ago

Ran into the same issue.

twiecki commented 8 years ago

It seems to work for simpler models, but the stochastic volatility model I can only run with njobs=2, but it breaks with njobs=4. So odd.

twiecki commented 8 years ago

Can you try if https://github.com/pymc-devs/pymc3/commit/e873d6d728370b139b24d95e2c2a60d016589e0d fixes it?

fonnesbeck commented 8 years ago

Well, I get a different error, so that's progress.

MaybeEncodingError: Error sending result: '[<MultiTrace: 1 chains, 40000 iterations, 9 variables>]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'

twiecki commented 8 years ago

And what a specific error it is. MaybeEncodingErrorMaybeNot

fonnesbeck commented 8 years ago

Yeah, that seemed odd -- creating an Exception subclass for an error that you're not totally sure about.

twiecki commented 8 years ago

Anyway, it looks like we're passing maybe an object where an int is expected?

hvasbath commented 8 years ago

You can somewhat hack this with sys.setrecursionlimit(2000), but this also works up to a certain amount of parameters. With my latest model around 450 parameters it doesnt help. As I really need the parallel implementation to work otherwise my model has to run for monthes, I would want to look into this. Can you point me to some code lines where I could start looking- as I am not so familiar yet with the code base. Thank you!

hvasbath commented 8 years ago

With increasing the recursion limit and the latest commit from twiecki above( e873d6d ) I get this error. It keeps running doing nothing. Does anybody have any advice where I could start investigating?

Exception in thread Thread-14: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(_self.args, *_self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) SystemError: NULL result without error in PyObject_Call

fonnesbeck commented 8 years ago

All of the multiprocessing business for PyMC3 is in the sampling module. Its pretty basic mapping of processes to the elements of a multiprocessing Pool. We might want to explore using ipyparallel for parallel processing.

twiecki commented 8 years ago

I have also considered switching. The issue is that currently you can't launch processes internally (see https://github.com/ipython/ipyparallel/issues/22 for a plan to change that).

fonnesbeck commented 8 years ago

That should not be a deal-breaker. Forcing the user to spin up ipcluster is not particularly onerous, particularly if you are working in Jupyter, where it is just a tab in the interface. I think its a small price to pay for more robust parallelism, and if it gets automated in the future, all the better.

datnamer commented 8 years ago

What about Dask?

fonnesbeck commented 8 years ago

Would Dask be effective here? I could see if we were applying the same algorithm to subsets of a dataset, but a set of parallel chains executes over the entire dataset for each chain. So, its not clear how Dask's collections would be beneficial. That said, it may be useful if we ever implement expectation propagation, which does subdivide the data.

datnamer commented 8 years ago

Dask imperative + multiprocessing scheduler can schedule the chains without needing to use a specific collection to chunk.

But this is out of my depth.

Maybe @mrocklin can chime in.

twiecki commented 8 years ago

I don't think Dask, although awesome, can be leveraged here.

mrocklin commented 8 years ago

If someone can briefly describe the problem I'd be happy to chime in if there is potential overlap. The dask schedulers are useful well outside the common use case of big chunked arrays. If you're considering technologies like multiprocessing or ipyparallel it's possible that one of the dask schedulers could be relevant.

fonnesbeck commented 8 years ago

@mrocklin Matt, this is Monte Carlo sampling for Bayesian statistical modeling. Its an embarrassingly parallel task that just simulates Markov chains using the same model on the same dataset, then uses the sampled chains (the output of the algorithm) for inference. We are currently using the multiprocessing module for this, but are contemplating a move to something more robust.

mrocklin commented 8 years ago

Something non-trivial must be going on to cause multiprocessing to hang.

Looking at the traceback it seems like you might be trying to send something that pickle doesn't like? Historically I've gotten around this by pre-serializing everything with dill or cloudpickle before I hand things off to multiprocessing. This is what dask.multiprocessing.get does.

If this is what is going on then the pathos library would probably be a decent drop in replacement for you all. It's a multiprocessing clone that uses dill.

But really, I'm just guessing at the problem that you're trying to solve and so am probably out of my depth here. Happy to help if I can. Best of luck.

fonnesbeck commented 8 years ago

Thanks, Matt. Unfortunately pathos appears not to support Python 3 yet, so I will look at explicitly passing everything to dill.

mrocklin commented 8 years ago

I write a function like the following:

def apply(serialized_func, serialized_args, serialized_kwargs):
    func = dill.loads(serialized_func)
    args = dill.loads(serialized_args)
    kwargs = dill.loads(serialized_kwargs)

And then I dumps my func, args, kwargs ahead of time and call them with the apply function remotely. Something like the following:

pool.map(apply, [dill.dumps(func) for i in range(len(sequence))], [dill.dumps(args) for args in sequence])

<self serving> Or, you can always just use dask.multiprocessing.get, where this work is already done. </self serving>

fonnesbeck commented 8 years ago

I might have found a solution using Joblib, but will give this a shot if that doesn't work. Thanks again.

mrocklin commented 8 years ago

Oh great. That's much simpler.

tyarkoni commented 8 years ago

I don't think this solves the problem, unfortunately... On the joblib branch, with njobs=4 and a pretty big model, I still get a max recursion exceeded exception (see below). On inspection, it looks like Joblib uses multiprocessing as its default backend, so I guess that makes sense. I tried switching to the threading backend, but that failed with a different set of errors.

Traceback (most recent call last):
  File "run_wm.py", line 53, in <module>
    run_model(40)
  File "run_wm.py", line 40, in run_model
    trace = model.run(samples=250, verbose=True, find_map=False, njobs=4)
  File "/Users/tal/Dropbox/Projects/RandomStimuli/code/pymcwrap/pymcwrap/model.py", line 383, in run
    samples, start=start, step=step, progressbar=verbose, njobs=njobs)
  File "/usr/local/lib/python3.5/site-packages/pymc3-3.0-py3.5.egg/pymc3/sampling.py", line 146, in sample
    return sample_func(**sample_args)
  File "/usr/local/lib/python3.5/site-packages/pymc3-3.0-py3.5.egg/pymc3/sampling.py", line 272, in _mp_sample
    **kwargs) for i in range(njobs))
  File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 810, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 727, in retrieve
    self._output.extend(job.get())
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
  File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)
  File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/pool.py", line 368, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
RecursionError: maximum recursion depth exceeded

fonnesbeck commented 8 years ago

It was worth a shot. I will try flavoring it with a little dill.

fonnesbeck commented 8 years ago

Actually, joblib serializes the arguments for us, so that's not the solution. Increasing the recursion limit helps (as @eigenblutwurst notes), which I have done inside sample. This problem may resurface with bigger models. Works well on the rugby analytics example (which I have modified) using 4 cores.

twiecki commented 8 years ago

https://github.com/joblib/joblib/pull/240

fonnesbeck commented 8 years ago

@twiecki nice find! I will try using the joblib from master and see if that does the trick without boosting recursion limits.

fonnesbeck commented 8 years ago

Unfortunately, dill does not save the day. Still hits the recursion limit. My last commit to the branch boosts it to 10000, which may be sufficient for models of a certain size.

tyarkoni commented 8 years ago

My model has ~1,000 parameters, and still fails with a recursion limit of 10,000. Not the end of the world in this case, as it runs tolerably with one process and converges quickly. But would be nice to have this work at some point.

Do you know off-hand where the recursion is happening? I'm guessing it's in the Model or MultiTrace. It would be a bit of work, but a reasonable way forward might be to implement a __reduce__ method on the problem class that replaces object references with hashed IDs, and then reconstructs the references on deserialization. If you can give me some pointers as to where the recursion is happening, I can take a crack at this next time I have some free time.

fonnesbeck commented 8 years ago

I assumed it had something to do with the Theano model building its graph, so somewhere in the Model. That would explain why it happens with high-dimensional model, but I am just guessing. Maybe @jsalvatier has ideas.

tyarkoni commented 8 years ago

Oh, that makes sense. And it look like it's come up several times before on the Theano repo:

https://github.com/Theano/Theano/issues/1795 https://github.com/Theano/Theano/issues/3341 https://github.com/Theano/Theano/issues/3607

Guess I'll stick with raising the recursion limit even further and hoping for the best. :)

hvasbath commented 8 years ago

I dont think its the theano graph. Otherwise it would crash already during compilation. Wouldnt it? After increasing the recursion limit I get the error message above. Which I cannot interprete at all. Anyway its nice that so many people started looking into it :) .

fonnesbeck commented 8 years ago

I am just guessing at plausible explanations. I've tried changing up the trace, both by drastically reducing the number of samples, and by using text or sqlite backends, but these both result in the same error.

The failure is triggered during serialization dumps, both under Joblib and plain multiprocessing.

fonnesbeck commented 8 years ago

I can hit the recursion error without using multiprocessing:

https://gist.github.com/f2da55c0e2f6d8f35a25

so I'm pretty convinced this is related to the depth of the computational graph.

fonnesbeck commented 8 years ago

The example above is pretty worrying because it is a small model (albeit with a large variance-covariance matrix). It will be important to see if we are just being inefficient in PyMC about specifying the model graph, or if this is a limitation of Theano.

hvasbath commented 8 years ago

But you have two nested for loops, which are very inefficient to be used in theano. Did you check with profiling how many apply nodes you create with this? If this reaches a couple of thousand it will crash. By using scan you can significantly reduce this number of nodes. With the recent bleeding edge theano version scan became really fast. In total I have 2 times, two nested scans in my quite complex model and I do not get the recursion error anymore. The error I get is above something in threading...

hvasbath commented 8 years ago

Also you are applying a numpy operation np.exp on a theano tensor your HalfCauchy distributions. Or is this not a tensor?

fonnesbeck commented 8 years ago

Good catch on the exponential. I was using list compressions because I was looping over data. Should I be using shared variables instead with scan?

fonnesbeck commented 8 years ago

Or I could just do this, and avoid the issue altogether!

twiecki commented 8 years ago

@fonnesbeck can you elaborate?

fonnesbeck commented 8 years ago

In the squared exponential, I can calculate the covariance matrix without looping, simply by multiplying the distance matrix by scalars. So this side-steps the problem of doing Python loops with Theano variables. This was creating a huge computational graph that caused the recursion depth issue.

twiecki commented 8 years ago

Cool! A GP example would be a great addition :).

hvasbath commented 8 years ago

Good job simplifying it! Thats nice. But if you cant- exactly shared variables and scan would be the way to go for any loop which is long. If you have a loop that creates few hundred apply nodes it is still fine to use for, but as soon it goes into thousand you will face problems- either with horribly long optimization time, or recursion errors ;) .

fonnesbeck commented 8 years ago

The only problem is -- the model gives completely incorrect answers! I've updated the notebook to include the same model run under Stan, and you can see that PyMC3 goes off the rails.

EDIT: it would help if I parameterized it correctly! Forgot that we use the inverse covariance on MvNormal. The results are very close to Stan's implementation now, though ours does not seem to mix as well.

twiecki commented 8 years ago

@fonnesbeck This is awesome, code is also so much more elegant and readable than the STAN version. Would be fun to try with the recently fixed ADVI.

twiecki commented 8 years ago

That works quite well indeed

fonnesbeck commented 8 years ago

Those are with ADVI?

This issue has gone down a rabbit hole, but it should be easy to automate a lot of the GP building.

jsalvatier commented 8 years ago

A very productive rabbit hole it seems.

On Sat, Feb 20, 2016 at 2:47 PM Chris Fonnesbeck notifications@github.com wrote:

Those are with ADVI?

This issue has gone down a rabbit hole, but it should be easy to automate a lot of the GP building.

— Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/879#issuecomment-186694497.

jonsedar commented 8 years ago

I've also run into this quite often. Thought I'd create a minimum reproducible example in case anyone wants to use it for testing etc, it's very basic linear regression: https://gist.github.com/jonsedar/b68136b53ee43465ecc9

pymc-devs / pymc

Running multiple chains causes RecursionError #879