pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.68k stars 2.01k forks source link

pickle error in pm.samples for cores > 1 with black box likelihood #5609

Open TeaWolf opened 2 years ago

TeaWolf commented 2 years ago

I have a model which depends on a block box likelihood function, which calls into AMICI. This model samples fine when using cores=1. But when switching to anything higher I get the following error:


Traceback (most recent call last):
  File "/miniconda3/envs/rjmcmc/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/miniconda3/envs/rjmcmc/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/.vscode-oss/extensions/ms-python.python-2022.2.1924087327/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/.vscode-oss/extensions/ms-python.python-2022.2.1924087327/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/.vscode-oss/extensions/ms-python.python-2022.2.1924087327/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 331, in run_module
    run_module_as_main(target_as_str, alter_argv=True)
  File "/miniconda3/envs/rjmcmc/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/miniconda3/envs/rjmcmc/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/sketches/blasi_model_selection.py", line 264, in <module>
    trace = pm.sample(ndraws, tune = nburn, step = step, discard_tuned_samples = True, cores=2, chains=nchains)
  File "/miniconda3/envs/rjmcmc/lib/python3.9/site-packages/pymc/sampling.py", line 569, in sample
    mtrace = _mp_sample(**sample_args, **parallel_args)
  File "/miniconda3/envs/rjmcmc/lib/python3.9/site-packages/pymc/sampling.py", line 1493, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/miniconda3/envs/rjmcmc/lib/python3.9/site-packages/pymc/parallel_sampling.py", line 411, in __init__
    step_method_pickled = cloudpickle.dumps(step_method, protocol=-1)
  File "/miniconda3/envs/rjmcmc/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/miniconda3/envs/rjmcmc/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'SwigPyObject' object

This error actually only started appearing after switching from pymc3 to the new 4.0.0bx. but I need some of the new features so I can't go back.

Versions and main components

michaelosthege commented 2 years ago

This type of error is often related to how/when/where classes are defined. It's hard to give more precise information without seeing the code.

Two more things: are you aware of sunode? With sunode you can do ODE things with SUNDIALS silvers and it is not blackbox, but differentiable. Depending on the structure of the model, our murefi package can help with setting it up in PyMC.

michaelosthege commented 2 years ago

Oh and you should update to 4.0.0b4. There was a memory leak with b3.

TeaWolf commented 2 years ago

Hey Michael, I wasn't aware of sunode, it looks pretty cool. Maybe I can add it in later. Here's an example that doesn't use AMICI but produces a similar error, the culprit this time being a generator expression. Could it be coming from my use of closures ? edit: The error still occurs after upgrading to 4.0.0b4

michaelosthege commented 2 years ago

Yes, closures and locally defined classes or functions are often correllated with pickling errors.

TeaWolf commented 2 years ago

hmmm. That might be a lot for me to try and undo at the moment though. There must be a way around this though because it worked just fine in pymc3.

Huzaifg commented 1 year ago

Has there been any update on this error. I am facing the exact same problem when increasing the cores beyond 1. In my black-box likelihood, I call model functions that I originally wrote in c++ and have now wrapped with SWIG and exposed to python.

This is my error

Sampling 4 chains in 4 jobs
Traceback (most recent call last):----------------------------| 0.00% [0/100 00:00<?]
  File "/home/unjhawala/projectlets/misc/2022/DataDrivenModSim/BayesianCalibration/dART/dART_acc_wrapped.py", line 264, in <module>
    main()
  File "/home/unjhawala/projectlets/misc/2022/DataDrivenModSim/BayesianCalibration/dART/dART_acc_wrapped.py", line 192, in main
    idata = pm.sample_smc(draws = ndraws,parallel=True,cores=4,return_inferencedata=True,progressbar = True)
  File "/home/unjhawala/anaconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/smc/sample_smc.py", line 220, in sample_smc
    results = run_chains_parallel(
  File "/home/unjhawala/anaconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/smc/sample_smc.py", line 399, in run_chains_parallel
    params = tuple(cloudpickle.dumps(p) for p in params)
  File "/home/unjhawala/anaconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/smc/sample_smc.py", line 399, in <genexpr>
    params = tuple(cloudpickle.dumps(p) for p in params)
  File "/home/unjhawala/anaconda3/envs/pymc_env/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/unjhawala/anaconda3/envs/pymc_env/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 633, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'SwigPyObject' object

This is my sampling line of code

idata = pm.sample_smc(draws = ndraws,parallel=True,cores=4,return_inferencedata=True,progressbar = True)

Here is my environment set up pymc 4.1.7 cloud pickle 2.1.0

Let me know if you need any more details

michaelosthege commented 1 year ago

@Huzaifg the underlying problem is that objects with a reference to C objects can't be pickled.

The way how a PyMC model is parallelized is roughly this:

  1. Pickle the step method which has references to the model graph, thereby also to any custom black-box Op instances.
  2. Spawn/fork subprocesses
  3. Unpickle the step method

So your blackbox Op most likely has a reference (e.g. an instance attribute) to things that can't be pickled. In this case probably your SWIG objects.

You must find a different way to keep these references; one that doesn't not get pickled, but is also compatible with fork/spawn multiprocessing.

One way to do this is with a module-level (ident 0) dictionary which can be accessed by the (unpickled) Op instance, but is not owned by it. In this case the Op instance just needs to remember the key under which it put the non-pickleable thing into the dict:

_treasurebag = {}

class MyOp(Op):
    def __init__(self, c_function):
        # Don't store the reference on the instance
        # self.c_function = c_function

        # Keep it under a token outside the instance
        self.token = random()
        _treasurebag[self.token] = c_function

By including f"{os.getpid()}-{threading.get_ident()}" in the dict keys one can even deal with objects that need to be re-initialized on the new threads/processes.

Huzaifg commented 1 year ago

@michaelosthege Thank you for your suggestion. Although I can imagine your method working well, I took a simpler route once I understood that I just did not need reference's to the c objects in the likelihood.

I converted my likelihood function to a pure python function. Within this likelihood function, I just call another function (say model) that within it creates the instances of the c structs I need to simulate my model and returns the output of my model (just a numpy array). With the return from model, all my C struct instances are deleted (python garbage collection - hopefully no memory leaks), and my likelihood is then clean of any C structs. This seems to be working well and I am able to run multiple chains in parallel. Its a bit ugly, but I think I am happy with it for now.

Thanks a ton!