Closed fonnesbeck closed 8 years ago
Ran into the same issue.
It seems to work for simpler models, but the stochastic volatility model I can only run with njobs=2, but it breaks with njobs=4. So odd.
Can you try if https://github.com/pymc-devs/pymc3/commit/e873d6d728370b139b24d95e2c2a60d016589e0d fixes it?
Well, I get a different error, so that's progress.
MaybeEncodingError: Error sending result: '[<MultiTrace: 1 chains, 40000 iterations, 9 variables>]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'
And what a specific error it is. MaybeEncodingErrorMaybeNot
Yeah, that seemed odd -- creating an Exception subclass for an error that you're not totally sure about.
Anyway, it looks like we're passing maybe an object where an int is expected?
You can somewhat hack this with sys.setrecursionlimit(2000), but this also works up to a certain amount of parameters. With my latest model around 450 parameters it doesnt help. As I really need the parallel implementation to work otherwise my model has to run for monthes, I would want to look into this. Can you point me to some code lines where I could start looking- as I am not so familiar yet with the code base. Thank you!
With increasing the recursion limit and the latest commit from twiecki above( e873d6d ) I get this error. It keeps running doing nothing. Does anybody have any advice where I could start investigating?
Exception in thread Thread-14: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(_self.args, *_self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) SystemError: NULL result without error in PyObject_Call
All of the multiprocessing business for PyMC3 is in the sampling
module. Its pretty basic mapping of processes to the elements of a multiprocessing Pool
. We might want to explore using ipyparallel
for parallel processing.
I have also considered switching. The issue is that currently you can't launch processes internally (see https://github.com/ipython/ipyparallel/issues/22 for a plan to change that).
That should not be a deal-breaker. Forcing the user to spin up ipcluster
is not particularly onerous, particularly if you are working in Jupyter, where it is just a tab in the interface. I think its a small price to pay for more robust parallelism, and if it gets automated in the future, all the better.
What about Dask?
Would Dask be effective here? I could see if we were applying the same algorithm to subsets of a dataset, but a set of parallel chains executes over the entire dataset for each chain. So, its not clear how Dask's collections would be beneficial. That said, it may be useful if we ever implement expectation propagation, which does subdivide the data.
Dask imperative + multiprocessing scheduler can schedule the chains without needing to use a specific collection to chunk.
But this is out of my depth.
Maybe @mrocklin can chime in.
I don't think Dask, although awesome, can be leveraged here.
If someone can briefly describe the problem I'd be happy to chime in if there is potential overlap. The dask schedulers are useful well outside the common use case of big chunked arrays. If you're considering technologies like multiprocessing or ipyparallel it's possible that one of the dask schedulers could be relevant.
@mrocklin Matt, this is Monte Carlo sampling for Bayesian statistical modeling. Its an embarrassingly parallel task that just simulates Markov chains using the same model on the same dataset, then uses the sampled chains (the output of the algorithm) for inference. We are currently using the multiprocessing module for this, but are contemplating a move to something more robust.
Something non-trivial must be going on to cause multiprocessing to hang.
Looking at the traceback it seems like you might be trying to send something that pickle
doesn't like? Historically I've gotten around this by pre-serializing everything with dill or cloudpickle before I hand things off to multiprocessing
. This is what dask.multiprocessing.get
does.
If this is what is going on then the pathos
library would probably be a decent drop in replacement for you all. It's a multiprocessing
clone that uses dill
.
But really, I'm just guessing at the problem that you're trying to solve and so am probably out of my depth here. Happy to help if I can. Best of luck.
Thanks, Matt. Unfortunately pathos
appears not to support Python 3 yet, so I will look at explicitly passing everything to dill
.
I write a function like the following:
def apply(serialized_func, serialized_args, serialized_kwargs):
func = dill.loads(serialized_func)
args = dill.loads(serialized_args)
kwargs = dill.loads(serialized_kwargs)
And then I dumps my func, args, kwargs ahead of time and call them with the apply function remotely. Something like the following:
pool.map(apply, [dill.dumps(func) for i in range(len(sequence))], [dill.dumps(args) for args in sequence])
<self serving>
Or, you can always just use dask.multiprocessing.get
, where this work is already done. </self serving>
I might have found a solution using Joblib, but will give this a shot if that doesn't work. Thanks again.
Oh great. That's much simpler.
I don't think this solves the problem, unfortunately... On the joblib branch, with njobs=4 and a pretty big model, I still get a max recursion exceeded exception (see below). On inspection, it looks like Joblib uses multiprocessing as its default backend, so I guess that makes sense. I tried switching to the threading backend, but that failed with a different set of errors.
Traceback (most recent call last):
File "run_wm.py", line 53, in <module>
run_model(40)
File "run_wm.py", line 40, in run_model
trace = model.run(samples=250, verbose=True, find_map=False, njobs=4)
File "/Users/tal/Dropbox/Projects/RandomStimuli/code/pymcwrap/pymcwrap/model.py", line 383, in run
samples, start=start, step=step, progressbar=verbose, njobs=njobs)
File "/usr/local/lib/python3.5/site-packages/pymc3-3.0-py3.5.egg/pymc3/sampling.py", line 146, in sample
return sample_func(**sample_args)
File "/usr/local/lib/python3.5/site-packages/pymc3-3.0-py3.5.egg/pymc3/sampling.py", line 272, in _mp_sample
**kwargs) for i in range(njobs))
File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 810, in __call__
self.retrieve()
File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 727, in retrieve
self._output.extend(job.get())
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/local/Cellar/python3/3.5.0/Frameworks/Python.framework/Versions/3.5/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/pool.py", line 368, in send
CustomizablePickler(buffer, self._reducers).dump(obj)
RecursionError: maximum recursion depth exceeded
It was worth a shot. I will try flavoring it with a little dill
.
Actually, joblib
serializes the arguments for us, so that's not the solution. Increasing the recursion limit helps (as @eigenblutwurst notes), which I have done inside sample
. This problem may resurface with bigger models. Works well on the rugby analytics example (which I have modified) using 4 cores.
@twiecki nice find! I will try using the joblib from master and see if that does the trick without boosting recursion limits.
Unfortunately, dill
does not save the day. Still hits the recursion limit. My last commit to the branch boosts it to 10000, which may be sufficient for models of a certain size.
My model has ~1,000 parameters, and still fails with a recursion limit of 10,000. Not the end of the world in this case, as it runs tolerably with one process and converges quickly. But would be nice to have this work at some point.
Do you know off-hand where the recursion is happening? I'm guessing it's in the Model or MultiTrace. It would be a bit of work, but a reasonable way forward might be to implement a __reduce__
method on the problem class that replaces object references with hashed IDs, and then reconstructs the references on deserialization. If you can give me some pointers as to where the recursion is happening, I can take a crack at this next time I have some free time.
I assumed it had something to do with the Theano model building its graph, so somewhere in the Model. That would explain why it happens with high-dimensional model, but I am just guessing. Maybe @jsalvatier has ideas.
Oh, that makes sense. And it look like it's come up several times before on the Theano repo:
https://github.com/Theano/Theano/issues/1795 https://github.com/Theano/Theano/issues/3341 https://github.com/Theano/Theano/issues/3607
Guess I'll stick with raising the recursion limit even further and hoping for the best. :)
I dont think its the theano graph. Otherwise it would crash already during compilation. Wouldnt it? After increasing the recursion limit I get the error message above. Which I cannot interprete at all. Anyway its nice that so many people started looking into it :) .
I am just guessing at plausible explanations. I've tried changing up the trace, both by drastically reducing the number of samples, and by using text or sqlite backends, but these both result in the same error.
The failure is triggered during serialization dumps, both under Joblib and plain multiprocessing
.
I can hit the recursion error without using multiprocessing:
https://gist.github.com/f2da55c0e2f6d8f35a25
so I'm pretty convinced this is related to the depth of the computational graph.
The example above is pretty worrying because it is a small model (albeit with a large variance-covariance matrix). It will be important to see if we are just being inefficient in PyMC about specifying the model graph, or if this is a limitation of Theano.
But you have two nested for loops, which are very inefficient to be used in theano. Did you check with profiling how many apply nodes you create with this? If this reaches a couple of thousand it will crash. By using scan you can significantly reduce this number of nodes. With the recent bleeding edge theano version scan became really fast. In total I have 2 times, two nested scans in my quite complex model and I do not get the recursion error anymore. The error I get is above something in threading...
Also you are applying a numpy operation np.exp on a theano tensor your HalfCauchy distributions. Or is this not a tensor?
Good catch on the exponential. I was using list compressions because I was looping over data. Should I be using shared variables instead with scan
?
Or I could just do this, and avoid the issue altogether!
@fonnesbeck can you elaborate?
In the squared exponential, I can calculate the covariance matrix without looping, simply by multiplying the distance matrix by scalars. So this side-steps the problem of doing Python loops with Theano variables. This was creating a huge computational graph that caused the recursion depth issue.
Cool! A GP example would be a great addition :).
Good job simplifying it! Thats nice. But if you cant- exactly shared variables and scan would be the way to go for any loop which is long. If you have a loop that creates few hundred apply nodes it is still fine to use for, but as soon it goes into thousand you will face problems- either with horribly long optimization time, or recursion errors ;) .
The only problem is -- the model gives completely incorrect answers! I've updated the notebook to include the same model run under Stan, and you can see that PyMC3 goes off the rails.
EDIT: it would help if I parameterized it correctly! Forgot that we use the inverse covariance on MvNormal
. The results are very close to Stan's implementation now, though ours does not seem to mix as well.
@fonnesbeck This is awesome, code is also so much more elegant and readable than the STAN version. Would be fun to try with the recently fixed ADVI.
That works quite well indeed
Those are with ADVI?
This issue has gone down a rabbit hole, but it should be easy to automate a lot of the GP building.
A very productive rabbit hole it seems.
On Sat, Feb 20, 2016 at 2:47 PM Chris Fonnesbeck notifications@github.com wrote:
Those are with ADVI?
This issue has gone down a rabbit hole, but it should be easy to automate a lot of the GP building.
— Reply to this email directly or view it on GitHub https://github.com/pymc-devs/pymc3/issues/879#issuecomment-186694497.
I've also run into this quite often. Thought I'd create a minimum reproducible example in case anyone wants to use it for testing etc, it's very basic linear regression: https://gist.github.com/jonsedar/b68136b53ee43465ecc9
Setting the
njobs
parameter to run multiple chains results in an error: