Deadlock when mixing threading and multiprocessing

8088a9e8-db23-4ab6-b27e-dda4990049f6 commented 8 years ago

BPO	27422
Nosy	@rhettinger, @bitdancer, @applio
Files	test_threadfork.py: minimal example to produce a deadlock test_threadfork_backtrace.txt: backtrace of one of the child processes when running test_threadfork.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'invalid', 'docs'] title = 'Deadlock when mixing threading and multiprocessing' updated_at = user = 'https://bugs.python.org/MartinRitter' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'docs@python' closed = True closed_date = closer = 'davin' components = ['Documentation'] creation = creator = 'Martin Ritter' dependencies = [] files = ['43588', '43589'] hgrepos = [] issue_num = 27422 keywords = [] message_count = 12.0 messages = ['269593', '269594', '269603', '269613', '269727', '269734', '269785', '269787', '269794', '269798', '269803', '269817'] nosy_count = 7.0 nosy_names = ['rhettinger', 'r.david.murray', 'docs@python', 'devin', 'sbt', 'davin', 'Martin Ritter'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue27422' versions = ['Python 3.5'] ```

8088a9e8-db23-4ab6-b27e-dda4990049f6 commented 8 years ago

When creating a multiprocessing.Process in a threaded environment I get deadlocks waiting, I guess waiting for the lock to flush the output.

I attached a minimal example of the problem which hangs for me starting with 4 threads.

8088a9e8-db23-4ab6-b27e-dda4990049f6 commented 8 years ago

I attached a gdb backtrace of one of the child processes

bitdancer commented 8 years ago

Mixing multiprocessing and threading is problem prone in general. Hopefully one of the multiprocessing experts can say if this is a known problem or not...

rhettinger commented 8 years ago

It is in-fact problem prone (and not just in Python). The rule is "thread after you fork, not before". Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock.

applio commented 8 years ago

It would be nice to find an appropriate place to document the solid general guidance Raymond provided; though merely mentioning it somewhere in the docs will not translate into it being noticed. Not sure where to put it just yet...

Martin: Is there a specific situation that prompted your discovering this behavior? Mixing the spinning up of threads with the forking of processes requires appropriate planning to avoid problems and achieve desired performance. If you have a thoughtful design to your code but are still triggering problems, can you share more of the motivation?

As a side note, this is more appropriately labeled as a 'behavior' rather than a 'crash' -- the Python executable does not crash in any way but merely hangs in an apparent lock contention.

rhettinger commented 8 years ago

FWIW, this isn't even a Python specific behavior. It is just how threads, locks, and processes work (or in this case don't work). The code is doing what it is told to do which happens to not be what you want (i.e. a user bug rather than a Python bug).

I think a FAQ entry would be a reasonable place to mention this (it comes up more often than one would hope).

8088a9e8-db23-4ab6-b27e-dda4990049f6 commented 8 years ago

I agree that this is error prone and can not be fixed reliably on the python side. However, python makes it very easy to mix these two, a user might not even notice it if a function he calls uses fork and thus just use a ThreadPoolExecutor() because it's the simplest thing to do.

What could be an nice solution in my opinion if the multiprocessing module could check if there are already multiple threads active on process creation and issue a warning if so. This warning could of course be optional but would make this issue more obvious.

In my case we have a large C++ code base which still includes a lot of Fortran 77 code with common blocks all over the place (yay science). Everything is interfaced in python so to make sure that I do not have any side effects I run the some of the functions in a fork using multiprocessing.Process(). And in this case I just wanted to run some testing in parallel. I now switched to a ProcessPoolExecutor which works fine but for me.

applio commented 8 years ago

While I believe I understand the motivation behind the suggestion to detect when the code is doing something potentially dangerous, I'll point out a few things:

any time you ask for a layer of convenience, you must choose something to sacrifice to get it (usually performance is sacrificed) and this sacrifice will affect all code (including non-problematic code)
behind the scenes multiprocessing itself is employing multiple threads in the creation and coordination between processes -- "checking to see if there are multiple threads active on process creation" is therefore a more complicated request than it maybe first appears
Regarding "python makes it very easy to mix these two", I'd say it's nearly as easy to mix the two in C code -- the common pattern across different languages is to learn the pros+cons+gotchyas of working with processes and threads

I too come from the world of scientific software and the mixing of Fortran, C/C++, and Python (yay science and yay Fortran) so I'll make another point (apologies if you already knew this): There's a lot of computationally intensive code in scientific code/applications and being able to perform those computations in parallel is a wonderful thing. I am unsure if the tests you're trying to speed up exercise compute-intensive functions but let's assume they do. For reasons not described here, using the CPython implementation, there is a constraint on the use of threads that restricts them to all run on a single core of your multi-core cpu (and on only one cpu if you have an SMP system). Hence spinning up threads to perform compute intensive tasks will likely result in no better throughput (no speedup) because they're all fighting over the same maxed-out core. To spread out onto and take advantage of multiple cores (and multiple cpus on an SMP system) you will want switch to creating processes (as you say you now have). I'd make the distinction that you are likely much more interested in 'parallel computing' than 'concurrent execution'. Since you're already using multiprocessing you might also simply use multiprocessing.Pool.

8088a9e8-db23-4ab6-b27e-dda4990049f6 commented 8 years ago

Dear Davin,

Thanks for the input, I was perfectly aware that the "solution" I proposed is not realistic. But the feedback that multiprocessing is using threads internally is useful as I can quickly abandon the idea to do something like the check I proposed in our code base without spending time on it.

I was aware of the Gil, I just did not anticipate that big a problem when mixing threads and processes with rather simple python code. My bad, sorry for the noise.

Cheers,

Martin

bitdancer commented 8 years ago

To clarify the GIL issue (for davin, I guess? :): if the library you are using to interface with the FORTRAN code drops the GIL before calling the FORTRAN, then you *can take advantage of multiple cores. It is only the python code (and some of the code interacting with the python objects) that is limited to executing on one core at a time. (As far as I know it isn't restricted to be the *same core unless you set CPU affinity somehow, and I have no idea if it improves performance to use CPU affinity or not).

applio commented 8 years ago

@r.david.murray: Oh man, I was not going to go as far as advocate dropping the GIL. :)

At least not in situations like this where the exploitable parallelism is meant to be at the Python level and not inside the Fortran code (or that was my understanding of the setup). Martin had already mentioned the motivation to fork to avoid side effects possibly arising somewhere in that code.

In practice, after dropping the GIL the threads will likely use multiple of the cores -- though that's up to the OS kernel scheduler, that's what I've observed happening after temporarily dropping the GIL on both Windows and Linux systems.

As to the benefit of CPU affinity, it depends -- it depends upon what my code was and what the OS and other system processes were busily doing at the time my code ran -- but I've never seen it hurt performance (even if the help was diminishingly small at times). For certain situations, it has been worth doing.

Correction: I have seen cpu affinity hurt performance when I make a bone-headed mistake and constrain too many things onto too few cores. But that's a PEBCAK root cause.

bitdancer commented 8 years ago

Heh, yeah. What I was really trying to do by that comment was clarify for any *other readers that stumble on this issue later it is just the python code that *has to be constrained by the GIL. I have no idea how much of the scipy stack drops the gil at strategic spots. I do seem to remember that the the Jupyter uses multiple processes for its parallelism, though. Anyway, this is pretty off topic now :)

python / cpython

Deadlock when mixing threading and multiprocessing #71609