joblib / loky

Robust and reusable Executor for joblib
http://loky.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
528 stars 45 forks source link

The Manager.Queue implementation from the loky backend seems to be broken #407

Open ogrisel opened 1 year ago

ogrisel commented 1 year ago

See the reproducer in: https://github.com/joblib/joblib/issues/1467#issuecomment-1611785810

where it causes the code to raise:

TypeError: 'NoneType' object cannot be interpreted as an integer
uchenily commented 1 year ago

Thanks for your help, ogrisel. I don't have any idea now, but let me add a little extra information:

Parallel(n_jobs=2)((delayed(func1)(queue) for _ in range(32)))
Parallel(n_jobs=2, backend="multiprocessing")((delayed(func1)(queue) for _ in range(32)))
Parallel(n_jobs=2, backend="loky")((delayed(func1)(queue) for _ in range(32)))
Parallel(n_jobs=2, backend="threading")((delayed(func1)(queue) for _ in range(32)))

Neither the default backend locy nor the multiprocessing backend work. If I use the threading backend there will be no problem.

  File "/Users/ogrisel/mambaforge/envs/dev/lib/python3.11/multiprocessing/connection.py", line 367, in _send
    n = write(self._handle, buf)
        ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object cannot be interpreted as an integer

Here, self._handle became None, I think it related to _ConnectionBase#close() method in multiprocessing/connection.py when this method is called, self._handle will be reset to None

There may be conflicts between two or more processes during resource cleaning.