pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.66k stars 2k forks source link

BUG: Continual multiprocessing failures during sampling on Linux #7516

Closed fonnesbeck closed 7 hours ago

fonnesbeck commented 21 hours ago

Describe the issue:

I get failures near the beginning of sampling when running models on Linux. They are coming from the multiprocessing library. I can usually work around them by simply restarting the sampler, or using a different seed, but they are happening with increasing frequency in v5.16.

Reproduceable code example:

The model in the following notebook fails every time on my Linux laptop:

https://gist.github.com/fonnesbeck/d4b8da1f74a1a790892d774b7484ecfa

Error message:

---------------------------------------------------------------------------
ConnectionResetError                      Traceback (most recent call last)
Cell In[13], [line 2](vscode-notebook-cell:?execution_count=13&line=2)
      [1](vscode-notebook-cell:?execution_count=13&line=1) with model:
----> [2](vscode-notebook-cell:?execution_count=13&line=2)     trace = pm.sample(random_seed=SEED, step=pm.Metropolis(), tune=5000)

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:846, in sample(draws, tune, chains, cores, random_seed, progressbar, progressbar_theme, step, var_names, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, blas_cores, model, **kwargs)
    [844](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:844) _print_step_hierarchy(step)
    [845](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:845) try:
--> [846](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:846)     _mp_sample(**sample_args, **parallel_args)
    [847](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:847) except pickle.PickleError:
    [848](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:848)     _log.warning("Could not pickle model, sampling singlethreaded.")

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1259, in _mp_sample(draws, tune, step, chains, cores, random_seed, start, progressbar, progressbar_theme, traces, model, callback, blas_cores, mp_ctx, **kwargs)
   [1257](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1257) try:
   [1258](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1258)     with sampler:
-> [1259](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1259)         for draw in sampler:
   [1260](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1260)             strace = traces[draw.chain]
   [1261](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/mcmc.py:1261)             strace.record(draw.point, draw.stats)

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:471, in ParallelSampler.__iter__(self)
    [464](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:464) task = progress.add_task(
    [465](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:465)     self._desc.format(self),
    [466](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:466)     completed=self._completed_draws,
    [467](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:467)     total=self._total_draws,
    [468](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:468) )
    [470](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:470) while self._active:
--> [471](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:471)     draw = ProcessAdapter.recv_draw(self._active)
    [472](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:472)     proc, is_last, draw, tuning, stats = draw
    [473](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:473)     self._completed_draws += 1

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:328, in ProcessAdapter.recv_draw(processes, timeout)
    [326](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:326) idxs = {id(proc._msg_pipe): proc for proc in processes}
    [327](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:327) proc = idxs[id(ready[0])]
--> [328](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:328) msg = ready[0].recv()
    [330](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:330) if msg[0] == "error":
    [331](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/site-packages/pymc/sampling/parallel.py:331)     old_error = msg[1]

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:250, in _ConnectionBase.recv(self)
    [248](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:248) self._check_closed()
    [249](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:249) self._check_readable()
--> [250](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:250) buf = self._recv_bytes()
    [251](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:251) return _ForkingPickler.loads(buf.getbuffer())

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:430, in Connection._recv_bytes(self, maxsize)
    [429](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:429) def _recv_bytes(self, maxsize=None):
--> [430](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:430)     buf = self._recv(4)
    [431](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:431)     size, = struct.unpack("!i", buf.getvalue())
    [432](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:432)     if size == -1:

File ~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:395, in Connection._recv(self, size, read)
    [393](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:393) remaining = size
    [394](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:394) while remaining > 0:
--> [395](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:395)     chunk = read(handle, remaining)
    [396](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:396)     n = len(chunk)
    [397](https://file+.vscode-resource.vscode-cdn.net/var/home/fonnesbeck/repos/pymc-examples/examples/mixture_models/~/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/connection.py:397)     if n == 0:

ConnectionResetError: [Errno 104] Connection reset by peer


### PyMC version information:

PyMC 5.16.2
PyTensor 2.25.4
Python 3.12.5
OS Fedora 40
fonnesbeck commented 7 hours ago

Looks like this is associated with running Jupyter via VSCode. Closing.