Locks in the standard library should be sanitized on fork

gpshead commented 15 years ago

BPO	6721
Nosy	@rhettinger, @gpshead, @vsajip, @jcea, @nirs, @pitrou, @vstinner, @applio, @cagney, @Birne94, @ochedru, @kevans91, @jessefarnham, @rojer, @koubaa
PRs	python/cpython#4071 python/cpython#9291 python/cpython#21986 python/cpython#22205 python/cpython#22651
Files	lock_fork_thread_deadlock_demo.py forklocktests.patch reinit_locks.diff: patch adding locks reinitialization upon fork emit_warning_on_fork.patch atfork.patch reinit_locks_2.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-feature', 'library'] title = 'Locks in the standard library should be sanitized on fork' updated_at = user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = actor = 'kevans' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'gregory.p.smith' dependencies = [] files = ['14740', '21874', '22005', '22525', '24303', '25776'] hgrepos = [] issue_num = 6721 keywords = ['patch'] message_count = 133.0 messages = ['91674', '91936', '92766', '94102', '94115', '94133', '94135', '128282', '128307', '128311', '128316', '128369', '135012', '135067', '135069', '135079', '135083', '135095', '135096', '135143', '135157', '135173', '135543', '135857', '135866', '135897', '135899', '135948', '135965', '135984', '136003', '136039', '136045', '136047', '136120', '136147', '139084', '139245', '139470', '139474', '139480', '139485', '139488', '139489', '139509', '139511', '139521', '139522', '139584', '139599', '139608', '139800', '139808', '139850', '139852', '139858', '139869', '139897', '139929', '140215', '140402', '140550', '140658', '140659', '140668', '140689', '140690', '140691', '141286', '143174', '143274', '143279', '151168', '151266', '151267', '151845', '151846', '151853', '161019', '161029', '161389', '161405', '161470', '161953', '162019', '162031', '162034', '162036', '162038', '162039', '162040', '162041', '162053', '162054', '162063', '162113', '162114', '162115', '162117', '162120', '162137', '162160', '270015', '270017', '270018', '270019', '270020', '270021', '270022', '270023', '270028', '289716', '294726', '294834', '304714', '304716', '304722', '304723', '314983', '325326', '327267', '329474', '339369', '339371', '339393', '339418', '339454', '339458', '339473', '365169', '367528', '367702', '368882'] nosy_count = 29.0 nosy_names = ['rhettinger', 'gregory.p.smith', 'vinay.sajip', 'jcea', 'nirs', 'pitrou', 'vstinner', 'nirai', 'forest_atq', 'ionelmc', 'bobbyi', 'neologix', 'Giovanni.Bajo', 'sdaoden', 'tshepang', 'sbt', 'lesha', 'dan.oreilly', 'davin', 'Connor.Wolf', 'Winterflower', 'cagney', 'Birne94', 'ochedru', 'kevans', 'jesse.farnham', 'hugh', 'rojer', 'koubaa'] pr_nums = ['4071', '9291', '21986', '22205', '22651'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue6721' versions = ['Python 3.7'] ```

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

> - how do you find the correct order to acquire locks in the parent process?

One option is to use the import graph to determine call order of atfork handlers. If a current std library does not fit into this scheme we can possibly fix it when writing its handlers.

Sorry, I fail to see how the "import graph" is related to the correct lock acquisition order. Some locks are created dynamically, for example. That's why I asked for a specific API: when do you register a handler? When are they called? When are they reset?

> - what do you do with locks that can be held for arbitrarily long (e.g. I/O locks)?

It is likely that such a lock does not need acquiring at the parent, and re-initializing the library in the child handler will do.

The whole point of atfork is to avoid breaking invariants and introduce invalid state in the child process. If there is one thing we want to avoid, it's precisely reading/writting corrupted data from/to files, so eluding the I/O problem seems foolish to me.

A "critical section" lock that protects in-memory data should not be held for long.

Not necessarily. See for example I/O locks and logging module, which hold locks until I/O completion.

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

Sorry, I fail to see how the "import graph" is related to the correct lock acquisition order. Some locks are created dynamically, for example.

Import dependency is a reasonable heuristic to look into for inter-module locking order.

The rational is explained in the following pthread_atfork man page: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html "A higher-level package may acquire locks on its own data structures before invoking lower-level packages. Under this scenario, the order specified for fork handler calls allows a simple rule of initialization for avoiding package deadlock: a package initializes all packages on which it depends before it calls the pthread_atfork() function for itself."

(The rational section is an interpretation which is not part of the standard)

A caveat is that since Python is an object oriented language it is more common than with C that code from a higher level module will be invoked by code from a lower level module, for example by calling an object method that was over-ridden by the higher level module - this actually happens in the logging module (emit method).

That's why I asked for a specific API: when do you register a handler? When are they called? When are they reset?

Read the pthread_atfork man page.

The whole point of atfork is to avoid breaking invariants and introduce invalid state in the child process. If there is one thing we want to avoid, it's precisely reading/writting corrupted data from/to files, so eluding the I/O problem seems foolish to me.

Please don't use insulting adjectives. If you think I am wrong, convincing me logically will do.

you can "avoid breaking invariants" using two different strategies: 1) Acquire locks before the fork and release/reset them after it. 2) Initialize the module to some known state after the fork.

For some (most?) modules it may be quite reasonable to initialize the module to a known state after the fork without acquiring its locks before the fork; this too is explained in the pthread_atfork man page: "Alternatively, some libraries might be able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed)."

> A "critical section" lock that protects in-memory data should not be held for long.

Not necessarily. See for example I/O locks and logging module, which hold locks until I/O completion.

Oops, I have always used the term "critical section" to describe a lock that protects data state as tightly as possible, ideally not even across function calls but now I see the Wikipedia defines one to protect any resource including IO.

The logging module locks the entire emit() function which I think is wrong. It should let the derived handler take care of locking when it needs to, if it needs to at all.

The logging module is an example for a module we should reinitialize after the fork without locking its locks before the fork.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

[...]

A caveat is that since Python is an object oriented language it is more common than with C that code from a higher level module will be invoked by code from a lower level module, for example by calling an object method that was over-ridden by the higher level module - this actually happens in the logging module (emit method).

Exactly. That's why registering atfork() handler in independent modules can - and will - lead to deadlocks, if we get the order wrong. Also, most locks are allocated dynamically (at the same time as the object they protect), so the import order is not really relevant here. Furthermore, there's not a strict ordering between the modules: how is bz2 compared to loglib, for example?

> That's why I asked for a specific API: when do you register a handler? > When are they called? When are they reset?

Read the pthread_atfork man page.

No, it won't do it, since when an object - and its protecting lock - is deallocated, the related atfork handler must be removed, for example. You might handle this with wearefs, but that's definitely something not accounted for by the pthread_atfork standard API.

> The whole point of atfork is to avoid breaking invariants and > introduce invalid state in the child process. If there is one thing we > want to avoid, it's precisely reading/writting corrupted data from/to > files, so eluding the I/O problem seems foolish to me.

Please don't use insulting adjectives. If you think I am wrong, convincing me logically will do.

I'm sorry if that sounded insulting to you, it was really unintentional (English is not my mother tongue).

you can "avoid breaking invariants" using two different strategies: 1) Acquire locks before the fork and release/reset them after it. 2) Initialize the module to some known state after the fork.

For some (most?) modules it may be quite reasonable to initialize the module to a known state after the fork without acquiring its locks before the fork; this too is explained in the pthread_atfork man page: "Alternatively, some libraries might be able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed)."

The most problematic place is the I/O layer, since those are the locks held longer (see for example issue bpo-7123). And I'm not sure we can simply reinit the I/O object after fork() without corrupting or losing data. But this approach (reinitializing after fork()) works well most of the time, and is actually already used in multiple places (threading and multiprocessing modules, and probably others).

Oops, I have always used the term "critical section" to describe a lock that protects data state as tightly as possible, ideally not even across function calls but now I see the Wikipedia defines one to protect any resource including IO.

Yes, that's one peculiarity of Python locks. Another one is that a lock can be released by a process other than the one who acquired it.

The logging module locks the entire emit() function which I think is wrong. It should let the derived handler take care of locking when it needs to, if it needs to at all.

The logging module is an example for a module we should reinitialize after the fork without locking its locks before the fork.

It's possible.

Like I said earlier in this thread, I'm not at all opposed to the atfork mechanism. I'm actually in favor of it, the first reason being that we could factorize the various ad-hoc atfork handlers scattered through the standard library. My point is just that it's not as simple as it sounds because of long-held locks, and that we've got to be really careful because of inter-module dependencies.

Would you like to work on a patch to add an atfork mechanism?

a30f9ee9-ff29-4815-8c7f-da1b9b1b4a1d commented 13 years ago

Except for multiprocessing, does anyone know of any other module in the standard library that uses fork() and threads at the same time? After some grepping through the source I couldn't find any other cases.

I'm still in favor of just deprecating using fork() on a multithreaded process (with appropriate warnings and documentation) and I'm prepared to work on a patch that would remove the need for helper threads in the multiprocessing module.

I gather that having atfork would be useful beyond attempting to solve the locking problem, so this doesn't mean I'm opposed to it. However debugging rare problems in multithreaded/multiprocess applications is such a frustrating task that I really don't like a solution that only works in the common case.

In Python atfork() handlers will never run from signal handlers, and if I understood correctly, Charles-François described a way to "re-initialize" a Python lock safely under that assumption.

Just to clarify: it's not that POSIX atfork() handlers run from signal handlers. It's that after a fork in a multithreaded process POSIX only guarantees calls to "safe" functions, which is the same set of functions as those that are safe to call from signal handlers. This fact does not change for Python's os.fork().

pitrou commented 13 years ago

Except for multiprocessing, does anyone know of any other module in the standard library that uses fork() and threads at the same time? After some grepping through the source I couldn't find any other cases.

It's quite common to launch a subprocess from a thread, so as to communicate with the subprocess without blocking the main thread. I'm not sure the stdlib itself does it, but the test suite does (when run in parallel mode).

I'm prepared to work on a patch that would remove the need for helper threads in the multiprocessing module.

Your contribution is welcome.

Just to clarify: it's not that POSIX atfork() handlers run from signal handlers. It's that after a fork in a multithreaded process POSIX only guarantees calls to "safe" functions, which is the same set of functions as those that are safe to call from signal handlers.

For the record, I would consider POSIX totally broken from this point of view. It seems most modern systems allow much more than that, fortunately.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

Except for multiprocessing, does anyone know of any other module in the standard library that uses fork() and threads at the same time? After some grepping through the source I couldn't find any other cases.

The same problem arises in case of user-created threads, this problem is not specific to the multiprocessing.

Just to clarify: it's not that POSIX atfork() handlers run from signal handlers. It's that after a fork in a multithreaded process POSIX only guarantees calls to "safe" functions, which is the same set of functions as those that are safe to call from signal handlers. This fact does not change for Python's os.fork().

I think Nir knows perfectly that, he was just referring to a limitation of pthread_atfork:

fork() is async-safe, and thus can be called from a signal handler
but if pthread_atfork handlers are installed, then fork() can become non async-safe, if the handlers are not async-safe (and it's the case when you're dealing with POSIX mutexes for example) But since Python's user-defined signal handlers are actually called synchronously (and don't run on behalf of the signal handler), there's no risk of fork() being called from a signal handler.

I'm still in favor of just deprecating using fork() on a multithreaded process (with appropriate warnings and documentation)

We can't do that, it would break existing code. Furthermore, some libraries use threads behind the scene.

I'm prepared to work on a patch that would remove the need for helper threads in the multiprocessing module.

What do you mean by helper threads?

a30f9ee9-ff29-4815-8c7f-da1b9b1b4a1d commented 13 years ago

We can't do that, it would break existing code.

I would argue that such code is already broken.

What do you mean by helper threads?

multiprocessing uses threads behind the scenes to handle queue traffic and such for individual forked processes. It's something I also wasn't aware of until Antoine pointed it out. It also has its own implementation of atfork hooks in an attempt to handle the locking issue.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

> We can't do that, it would break existing code.

I would argue that such code is already broken.

that's not necessarily true, if your code is carefully designed
we can't forbid fork() in a multi-threaded application while it's allowed by POSIX
backward compatibility is *really* important

> What do you mean by helper threads?

multiprocessing uses threads behind the scenes to handle queue traffic and such for individual forked processes. It's something I also wasn't aware of until Antoine pointed it out. It also has its own implementation of atfork hooks in an attempt to handle the locking issue.

I'm curious as to how you'll manage to implement multiprocessing.queues without threads. Please open a dedicated issue for this.

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

Would you like to work on a patch to add an atfork mechanism?

I will start with an attempt to generate a summary "report" on this rabbit hole of a problem, the views and suggestions raised by people here and what we may expect from atfork approach, its limitations, etc... I will also take a deeper look into the code.

Hopefully my brain will not deadlock or fork while I am at it.

more words, I know...

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

Well, my brain did not deadlock, but after spinning on the problem for a while longer, it now thinks Tomaž Šolc and Steffen are right.

We should try to fix the multiprocessing module so it does not deadlock single-thread programs and deprecate fork in multi-threaded programs.

Here is the longer version, which is a summary of what people said here in various forms, observations from diving into the code and Googling around:

1) The rabbit hole

a) In a multi-threaded program, fork() may be called while another thread is in a critical section. That thread will not exist in the child and the critical section will remain locked. Any attempt to enter that critical section will deadlock the child.

b) POSIX included the pthread_atfork() API in an attempt to deal with the problem: http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

c) But it turns out atfork handlers are limited to calling async-signal-safe functions since fork may be called from a signal handler: http://download.oracle.com/docs/cd/E19963-01/html/821-1601/gen-61908.html#gen-95948

This means atfork handlers may not actually acquire or release locks. See opinion by David Butenhof who was involved in the standardization effort of POSIX threads: http://groups.google.com/group/comp.programming.threads/msg/3a43122820983fde

d) One consequence is that we can not assume third party libraries are safe to fork in multi-threaded program. It is likely their developers consider this scenario broken.

e) It seems the general consensus across the WWW concerning this problem is that it has no solution and that a fork should be followed by exec as soon as possible.

Some references: http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html http://austingroupbugs.net/view.php?id=62 http://sourceware.org/bugzilla/show_bug.cgi?id=4737 http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

2) Python’s killer rabbit

The standard library multiprocessing module does two things that force us into the rabbit hole; it creates worker threads and forks without exec.

Therefore, any program that uses the multiprocessing module is a multi-threading forking program.

One immediate problem is that a multiprocessing.Pool may fork from its worker thread in Pool._handle_workers(). This puts the forked child at risk of deadlock with any code that was run by the parent’s main thread (the entire program logic).

More problems may be found with a code review.

Other modules to look at are concurrent.futures.process (creates a worker thread and uses multiprocessing) and socketserver (ForkingMixIn forks without exec).

3) God bless the GIL

a) Python signal handlers run synchronously in the interpreter loop of the main thread, so os.fork() will never be called from a POSIX signal handler.

This means Python atfork prepare and parent handlers may run any code. The code run at the child is still restricted though and may deadlock into any acquired locks left behind by dead threads in the standard library or lower level third party libraries.

b) Turns out the GIL also helps by synchronizing threads.

Any lock held for the duration of a function call while the GIL is held will be released by the time os.fork() is called. But if a thread in the program calls a function that yields the GIL we are in la la land again and need to watch out step.

4) Landing gently at the bottom

a) I think we should try to review and sanitize the worker threads of the multiprocessing module and other implicit worker threads in the standard library.

Once we do (and since os.fork() is never run from a POSIX signal handler) the multiprocessing library should be safe to use in programs that do not start other threads.

b) Then we should declare the user scenario of mixing the threading and multiprocessing modules as broken by design.

c) Finally, optionally provide atfork API

The atfork API can be used to refactor existing fork handlers in the standard library, provide handlers for modules such as the logging module that will reduce the risk of deadlock in existing programs, and can be used by developers who insist on mixing threading and forking in their programs.

5) Sanitizing worker threads in the multiprocessing module

TODO :)

(will try to post some ideas soon)

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

Here is a morning reasoning exercise - please help find the flaws or refine it:

5) Sanitizing worker threads in the multiprocessing module

Sanitizing a worker thread in the context of this problem is to make sure it can not create a state that may deadlock another thread that calls fork(); or in other words fork-safe. Keep in mind that in Python os.fork() is never called from a POSIX signal handler. So what are examples of a fork-safe thread?

a) A thread that spins endlessly doing nothing in a C for(;;) loop is safe.

Another thread may call fork() without restrictions.

b) A Python thread that only calls function that do not yield the GIL and that does not acquire locks that are held beyond a Python tick is safe.

An example for such a lock is a critical-section lock acquired by a lower level third party library for the duration of a function call. Such a lock will be released by the time os.fork() is called because of the GIL.

c) A Python thread that in addition to (2) also acquires a lock that is handled at fork is safe.

d) A Python thread that in addition to (2) and (3) calls function that yield the GIL but while the GIL is released only calls async-signal-safe code.

This is a bit tricky. We know that it is safe for thread A to fork and call async-signal-safe functions regardless of what thread B has been doing, but I do not know that thread A can fork and call non async-signal-safe functions if thread B was only calling async-signal-safe functions.

Nevertheless it makes sense: For example lets assume it isn't true, and that hypothetical thread A forked while thread B was doing the async-signal-safe function safe_foo(), and then thread A called non async-signal-safe function unsafe_bar() and deadlocked.

unsafe_bar() could have deadlocked trying to acquire a lock that was acquired by safe_foo(). But if this is so, then it could also happen the other way around. Are there other practical possibilities?

Either way, we could double check and white list the async-signal-safe functions we are interested in, in a particular implementation.

e) Socket related functions such as bind() accept() send() and recv(), that Python calls without holding the GIL, are all async-signal-safe.

This means that in principle we can have a fork-safe worker thread for the purpose of communicating with a forked process using a socket.

gpshead commented 13 years ago

No Python thread is ever fork safe because the Python interpreter itself can never be made fork safe. Nor should anyone try to make the interpreter itself safe. It is too complex and effectively impossible to guarantee.

There is no general solution to this, fork and threading is simply broken in POSIX and no amount of duct tape outside of the OS kernel can fix it. My only desire is that we attempt to do the right thing when possible with the locks we know about within the standard library.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

If Nir's analysis is right, and Antoines comment pushes me into this direction, (i personally have not looked at that code), then multiprocessing is completely brain-damaged and has been implemented by a moron. And yes, I know this is a bug tracker, and even that of Python.

Nir should merge his last two messages into a single mail to python-dev, and those guys should give Nir or Thomas or a group of persons who have time and mental power a hg(1) repo clone and committer access to that and multiprocessing should be rewritten, maybe even from scratch, but i dunno.

For the extremely unlikely case that all that doesn't happen maybe the patch of neologix should make it?

--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

Um, and just to add: i'm not watching out for anything, and it won't and it can't be me:

--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

then multiprocessing is completely brain-damaged and has been implemented by a moron.

Please do not use this kind of language. Being disrespectful to other people hurts the discussion.

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

P.S.: I have to apologize, it's Tomaž, not Thomas. (And unless i'm mistaken this is pronounced "TomAsch" rather than the english "Tommes", so i was just plain wrong.)

--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

5792609d-7136-4bf5-a72c-931da2480f6a commented 13 years ago

> then multiprocessing is completely brain-damaged and has been > implemented by a moron.

Please do not use this kind of language. Being disrespectful to other people hurts the discussion.

So i apologize once again. 'Still i think this should go to python-dev in the mentioned case.

(BTW: there are religions without "god", so whom shall e.g. i praise for the GIL?)

--Steffen Ciao, sdaoden(*)(gmail.com) () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

(BTW: there are religions without "god", so whom shall e.g. i praise for the GIL?)

Guido? ;)

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

Hi Gregory,

Gregory P. Smith \greg@krypto.org\ added the comment: No Python thread is ever fork safe because the Python interpreter itself can never be made fork safe. Nor should anyone try to make the interpreter itself safe. It is too complex and effectively impossible to guarantee.

a) I think the term "Guarantee" is not meaningful here since the interpreter is probably too complex to guarantee it does not contain other serious problems. b) If no Python thread is ever fork safe, can you illustrate how a trivial Python thread spinning endlessly might deadlock a child forked by another Python thread?

I was not able to find reports of deadlocks clearly related to multiprocessing worker threads so they could be practically safe already, to the point other Python-Dev developers would be inclined to bury this as a theoretical problem :)

Anyway, there exists at least the problem of forking from the pool worker thread and possibly other issues, so the code should be reviewed. Another latent problem is multiprocessing logging which is disabled by default?

There is no general solution to this, fork and threading is simply broken in POSIX and no amount of duct tape outside of the OS kernel can fix it.

This is why we should "sanitize" the multithreading module and deprecate mixing of threading and multiprocessing. I bet most developers using Python are not even aware of this problem. We should make sure they are through documentation.

Here is another way to look at the current situation:

1) Don't use threading for concurrency because of the GIL. 2) Don't mix threading with multiprocessing because threading and forking don't mix well. 3) Don't use multiprocessing because it can deadlock.

We should make sure developers are aware of (2) and can use (3) safely***.

My only desire is that we attempt to do the right thing when possible with the locks we know about within the standard library.

Right, with an atfork() mechanism.

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 13 years ago

multiprocessing.util already has register_after_fork() which it uses for cleaning up certain things when a new process (launched by multiprocessing) is starting. This is very similar to the proposed atfork mechanism.

Multiprocessing assumes that it is always safe to delete lock objects. If reinit_locks.diff is committed then I guess this won't be a problem.

I will try to go through multiprocessing's use of threads:

Queue -----

Queue's have a feeder thread which pushes objects in to the underlying pipe as soon as possible. The state which can be modified by this thread is a threading.Condition object and a collections.deque buffer. Both of these are replaced by fresh copies by the after-fork mechanism.

However, because objects in the buffer may have __del__ methods or weakref callbacks associated, arbitrary code may be run by the background thread if the reference count falls to zero.

Simply pickling the argument of put() before adding it to the buffer fixes that problem -- see the patch for bpo-10886. With this patch I think Queue's use of threads is fork-safe.

Pool ----

If a fork occurs while a pool is running then a forked process will get a copy of the pool object in an inconsistent state -- but that does not matter since trying to use a pool from a forked process will *never* work.

Also, some of a pool's methods support callbacks which can execute arbitrary code in a background thread. This can create inconsistent state in a forked process

As with Queue.put, pool methods should pickle immediately for similar reasons.

I would suggest documenting clearly that a pool should only ever be used or deleted by the process which created it. We can use register_after_fork to make all of a pool's methods raise an error after a fork.

We should also document that callbacks should only be used if no more processes will be forked.

allow_connection_pickling -------------------------

Currently multiprocessing.allow_connection_pickling() does not work because types are registered with ForkingPickler instead of copyreg -- see bpo-4892. However, the code in multiprocessing.reduction uses a background thread to support the transfer of sockets/connections between processes.

If this code is ever resurrected I think the use of register_after_fork makes this safe.

Managers --------

A manager uses a threaded server process. This is not a problem unless you create a user defined manager which forks new processes. The documentation should just say Don't Do That.

I think multiprocessing's threading issues are all fixable.

7b124e7b-b61d-49f6-9762-fd36b627e237 commented 13 years ago

For the record, turns out there was a bit of misunderstanding.

I used the term deprecate above to mean "warn users (through documentation) that they should not use (a feature)" and not in its Python-dev sense of "remove (a feature) after a period of warning".

I do not think the possibility to mix threading and multiprocessing together should be somehow forcibly disabled.

Anyway, since my view does not seem to resonate with core developers I I'll give it a rest for now.

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 13 years ago

Anyway, since my view does not seem to resonate with core developers I I'll give it a rest for now.

Well, the problem is that many views have been expressed in this thread, which doesn't help getting a clear picture of what's needed to make progress on this issue. AFAIC, I think the following seems reasonable: 1) add an atfork module which provides a generic and pthread_atfork-like mechanism to setup handlers that must be called after fork (right now several modules use their own ad-hoc mechanism) 2) for multiprocessing, call exec() after fork() (issue bpo-8713) 3) for buffered file objects locks, use the approach similar to the patch I posted (reinit locks in the child process right after fork())

Does that sound reasonable?

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

Just wanted to say that I spent something like 8 hours debugging a subprocess + threading + logging deadlock on a real production system.

I suspected one of my locks at first, but I couldn't find any. The post-fork code was very simple, and I didn't suspect that logging would be subject to the same issue.

The good news that I see a very clean solution for fixing this.

We can't free all locks across fork -- that is unsafe and mad, because the child might end up corrupting some shared (network) resource, for example/

However, extending RLock to provide ForkClearedRLock (this would be used by logging, i.e.) is quite straighforward.

The extended class would simply need to record the process ID, in which the lock was created, and the process ID, which is trying to acquire it. Done!

79528080-9d85-4d18-8a2a-8b1f07640dd7 commented 12 years ago

However, extending RLock to provide ForkClearedRLock (this would be used by logging, i.e.) is quite straighforward.

The extended class would simply need to record the process ID, in which the lock was created, and the process ID, which is trying to acquire it. Done!

There are at least two problems with this approach.

with LinuxThreads, threads have different PIDs, so this would break. LinuxThreads have now been replaced with NPTL, so this may not be a showstopper, though
however, the other problem is more serious: it has the exact same problem as the lock reinitialization upon fork(): locks are used to protect critical sections, to make sure that threads see a consistent state: if you simply proceed and reset/acquire the lock when the process is not the last one that owned it, the invariants protected by the lock will be broken. The proper solution is to setup handlers to be called upon fork, that not only reset locks, but also internal state of objects they protect. However, this is a dull and boring task, and would require patching dozens of different places. It's been on my todo list for some time... Another "solution" would be to remove the only place in the standard library where a bare fork() is used, in multiprocessing (issue bpo-8713). Then, it's the user's problem if he calls fork() :-)

gpshead commented 12 years ago

A new lock type will NOT solve this. It is ALWAYS okay to clear all thread/threading module locks after a fork.

They are and always have been process-local by definition so they are also by definition 100% invalid to any child process.

Anyone who has written code using them to "lock" an out-of-process resource has written code that is already broken today. Thread locks can't guard network resources.

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

Attached is a patch (without documentation) which creates an atfork module for Unix.

Apart from the atfork() function modelled on pthread_atfork() there is also a get_fork_lock() function. This returns a recursive lock which is held whenever a child process is created using os.fork(), subprocess.Popen() or multiprocessing.Process(). It can be used like

    with atfork.get_fork_lock():
        r, w = os.pipe()
        pid = os.fork()
        if pid == 0:
            try:
                os.close(r)
                # do something with w
            finally:
                os._exit(0)
        else:
            os.close(w)

# do something with r

This prevents processes forked by other threads from accidentally inheriting the writable end (which would potentially cause EOF to be delayed when reading from the pipe). It can also be used to eliminate the potential race where you create an fd and then set the CLOEXEC flag on it.

The patch modifies Popen() and Process.start() to acquire the lock when they create their pipes. (A race condition previously made Process.sentinel and Process.join() potentially unreliable in a multithreaded program.)

Note that using the deprecated os.popen?() and os.spawn?() functions can still cause accidental inheritance of fds.

(I have also done a hopefully complete patch to multiprocessing to optionally allow fork+exec on Unix -- see bpo-8713.)

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

Is there any particular reason not to merge Charles-François's reinit_locks.diff?

Reinitialising all locks to unlocked after a fork seems the only sane option.

pitrou commented 12 years ago

Is there any particular reason not to merge Charles-François's reinit_locks.diff?

Reinitialising all locks to unlocked after a fork seems the only sane option.

I agree with this. I haven't looked at the patch very closely. I think perhaps each lock could have an optional callback for specific code to be run after forking, but that may come in another patch. (this would allow to make e.g. the C RLock fork-safe)

pitrou commented 12 years ago

Should we go forward on this?

gpshead commented 12 years ago

going forward with reinit_locks.diff makes sense.

I've added comments to it in the code review link. It is "Patch Set 3"

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

This is a reply to: http://bugs.python.org/issue6721#msg151266

Charles-François raises two problems:

1) LinuxThreads have different PIDs per thread. He then points out that LinuxThreads have long been deprecated.

2) you cannot release locks on fork() because that might let the forked process access protected resources.

My replies:

(1) Good catch. I suspect that this could be mitigated even if we cared about LinuxThreads. I haven't looked, but there's got to be a way to determine if we are a thread or a fork child.

(2) I think I didn't explain my idea very well. I don't mean that we should release *all* locks on fork. That will end in disaster, as Charles-François amply explained.

All I meant is that we could introduce a special lock class "ForkClearedRLock" that self-releases on fork(). We could even use Charles-François's reinit magic for this.

Using ForkClearedRLock in "logging" would prevent deadlocks. The only potential harm that would come from this is that your logfile might get pretty ugly, i.e. the fork parent and child might be printing simultaneously, resulting in logs like:

Error: parentparentparError: childchildchildchildchild entparentparent

It's not great, but it's definitely better than deadlocking.

I don't think logging can do anything more sensible across fork() anyway.

Did this explanation make more sense?

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

(1) Good catch. I suspect that this could be mitigated even if we cared about LinuxThreads. I haven't looked, but there's got to be a way to determine if we are a thread or a fork child.

Using a generation count would probably work just as well as the PID: main process has generation 0, children have generation 1, grandchildren have generation 2, ...

(2) I think I didn't explain my idea very well. I don't mean that we should release *all* locks on fork. That will end in disaster, as Charles-François amply explained.

So what are you suggesting? That a lock of the default type should raise an error if you try to acquire it when it has been acquired in a previous process?

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

So what are you suggesting? That a lock of the default type should raise an error if you try to acquire it when it has been acquired in a previous process?

I was suggesting a way to make 'logging' fork-safe. No more, no less.

Does what my previous comment make sense in light of this?

Using a generation count

Sure, that's a good idea.

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

> Is there any particular reason not to merge Charles-François's reinit_locks.diff? > > Reinitialising all locks to unlocked after a fork seems the only sane option.

I agree with this. I haven't looked at the patch very closely. I think perhaps each lock could have an optional callback for specific code to be run after forking, but that may come in another patch. (this would allow to make e.g. the C RLock fork-safe)

An alternative way of handling RLock.acquire() would be to always start by trying a non-blocking acquire while holding the GIL: if this succeeds and self->rlock_count != 0 then we can assume that the lock was cleared by PyThread_ReinitLocks().

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

Attached is an updated version of Charles-François's reinit_locks.diff.

Changes:

Handles RLock by assuming that if self->count != 0 when we acquire the lock, then the lock must have been reinitialized by PyThread_ReInitLocks().
Applies existing fork tests for Lock to RLock.
Fixes capitalization issues with PyThread_ReInitLocks()/PyThread_ReinitLocks().
Defines PyThread_ReInitLocks() to be empty on non-pthread platforms.

Note that RLock._is_owned() is unreliable after a fork until RLock.acquire() has been called.

Also, no synchronization has been added for the list of locks. Are PyThread_allocate_lock() and PyThread_free_lock() supposed to be safe to call while not holding the GIL?

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

I am really alarmed by the reinit_locks patches.

I scanned the comment history, and looked at the patch. I may have missed something, but it looks to me like the basic behavior is this:

"After fork(), all locks are replaced by brand-new lock objects that are NOT locked."

*Grim Prediction*: This is going to cause some disastrous, unexpected, and hilarious data loss or corruption for somebody.

Here is why:

  class MySQLConn:

    def __init__(self):
      self.lock = Lock()

    def doWork(self):
      self.lock.acquire()
      # do a sequence of DB operations that must not be interrupted,
      # and cannot be made transactional.
      self.lock.release()

Run this in a thread:

  def thread1(conn):
    while True:
      conn.doWork()
      time.sleep(0.053)

Run this in another thread:

  def thread2(conn):
    while True:
      conn.doWork()
      time.sleep(0.071)

Run this in a third thread:

  def thread2():
    while True:
      subprocess.call(["ls", "-l"])
      time.sleep(0.3)

With reinit_locks(), this will eventually break horribly.

a) fork() is called with the DB lock held by thread1. b) Some time passes before the child gets to exec(). c) In that time, the child's thread2 gets to doWork(). d) Simultaneously, the parent's doWork is still running and holding a lock. e) Thanks to reinit_locks, the child's thread2 does not have a lock, and it will merrily proceed to corrupt the parent's work.

So I think this approach is basically doomed.

I think my approach of marking _some_ locks as safe to reinit upon fork is workable (i.e. to solve the bad interaction with logging or import).

However, there's also an orthogonal approach that might work well:

1) Right before the first thread gets created in any Python program, fork off a fork() server.

From then on, subprocess will only use the fork server to call commands.

Thoughts?

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

a) fork() is called with the DB lock held by thread1. b) Some time passes before the child gets to exec(). c) In that time, the child's thread2 gets to doWork(). d) Simultaneously, the parent's doWork is still running and holding a lock. e) Thanks to reinit_locks, the child's thread2 does not have a lock, and it will merrily proceed to corrupt the parent's work.

You seem to be saying that all three threads survive the fork.

I think forkall() on Solaris acts like that, but the normal fork() function does not. Only the thread which performs fork() will survive in the child process.

So doWork() never runs in the child process, and the lock is never used in the child process.

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

I think forkall() on Solaris acts like that, but the normal fork() function does not. Only the thread which performs fork() will survive in the child process.

Sorry, brain fail. A slightly more contrived failure case is this:

subprocess.Popen(
  ..., 
  preexec_fn=lambda: conn.doWork()
)

Everything else is the same.

Another failure case is:

  class MySQLConn:
    ... doWork as before ...

    def __del__(self):
      self.doWork()

Followed by:

  def thread3(conn):
    while True:
      subprocess.call(['nonexistent_program']) 
      time.sleep(0.1)

The destructor will fire in the child and corrupt the parent's data.

An analogous example is:

  conn = MySQLConn()
  start_thread1(conn)
  start_thread2(conn):
  while True:
    if os.fork() == 0:  # child
      raise Exception('doom')  # triggers destructor

Basically, it is really really dangerous to release locks that protect any resources that are not copied by fork (i.e. network resources, files, DB connections, etc, etc).

gpshead commented 12 years ago

Anyone using a preexec function in subprocess has already declared that they like deadlocks so that isn't an issue. :)

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

Deadlocks are dandy, but corruption is cruel.

gpshead commented 12 years ago

threading locks cannot be used to protect things outside of a single process. Any code using them to do that is broken.

In your examples you are suggesting a class that wants to do one or more mysql actions within a destructor and worried that the __del__ method would be called in the fork()'ed child process.

With the subprocess module, this will never happen. the child exec's or does a hard exit. http://hg.python.org/cpython/file/bd2c2def77a7/Modules/_posixsubprocess.c#l634

When someone is using os.fork() directly, they are responsible for all destructors in their application behaving sanely within the child process.

Destructors are an evil place to put code that does actual work and are best avoided. When required, they must be written defensively because they really cannot depend on much of the Python execution environment around them being in a functional state as they have no control over _when_ they will be called during shutdown. Nothing new here.

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

Re "threading locks cannot be used to protect things outside of a single process":

The Python standard library already violates this, in that the "logging" module uses such a lock to protect the file/socket/whatever, to which it is writing.

If I had a magic wand that could fix all the places in the world where people do this, I'd accept your argument.

In practice, threading locks are abused in this way all the time.

Most people don't even think about the interaction of fork and threads until they hit a bug of this nature.

Right now, we are discussing a patch that will take broken code, and instead of having it deadlock, make it actually destroy data.

I think this is a bad idea. That is all I am arguing.

I am glad that my processes deadlocked instead of corrupting files. A deadlock is easier to diagnose.

You are right: subprocess does do a hard exit on exceptions. However, the preexec_fn and os.fork() cases definitely happen in the wild. I've done both of these before.

I'm arguing for a simple thing: let's not increase the price of error. A deadlock sucks, but corrupted data sucks much worse -- it's both harder to debug, and harder to fix.

gpshead commented 12 years ago

We could make any later attempt to acquire or release a lock that was reinitialized while it was held raise an exception.

Such exception raising behavior should be conditional at lock construction time; some code (such as logging) never wants to deal with one because it is unacceptable for it to ever fail or deadlock. It should also be possible for the exception to be usefully handled if caught; locks should gain an API to clear their internal "reinitialized while held" flag.

Q: Should .release() raise the exception? Or just warn? I'm thinking no exception on release(). Releasing a lock that was held during re-initialization could just reset the "reinitialized while held" flag.

The acquire() that would deadlock or crash today would be where raising an exception makes the most sense.

Deadlocks are unacceptable. The whole point of this bug is that we can do better. An exception would provide a stack trace of exactly which thing where caused the offending operation so that the code can be fixed to not misuse locks in the first place or that specific lock can be changed to silent reinitialization on fork. (or better yet, the fork can be removed entirely)

Both behaviors are better than today. This change would surface bugs in people's code much better than difficult to debug deadlocks.

It should be a pretty straightforward change to reinit_locks_2 (Patch Set 6) to behave that way.

Looking back, raising an exception is pretty much what I suggested in http://bugs.python.org/issue6721#msg94115 2.5 years ago.

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

conn = MySQLConn() start_thread1(conn) start_thread2(conn): while True: if os.fork() == 0: # child raise Exception('doom') # triggers destructor

There is no guarantee here that the lock will be held at the time of the fork. So even if we ensure that a lock acquired before the fork stayed lock, we won't necessarily get a deadlock.

More importantly, you should never fork without ensuring that you exit with os._exit() or os.exec*(). So your example should be something like

  conn = MySQLConn()
  start_thread1(conn)
  start_thread2(conn):
  while True:
    if os.fork() == 0:  # child
      try:
        raise Exception('doom')  # does NOT trigger destructor
      except:
        sys.excepthook(*sys.exc_info())
        os._exit(1)
      else:
        os._exit(0)

With this hard exit the destructor never runs.

vsajip commented 12 years ago

> Re "threading locks cannot be used to protect things outside of a > single process":

The Python standard library already violates this, in that the "logging" module uses such a lock to protect the file/socket/whatever, to which it is writing.

logging is not doing anything to protect things *outside* of a single process - the logging docs make that clear, and give specific recommendations for how to use logging in a multi-process scenario. Logging is just using locks to manage contention between multiple threads in a single process. In that sense, it is no different to any other Python code that uses locks.

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

I feel like I'm failing to get my thesis across. I'll write it out fully:

== Thesis start ==

Basic fact: It is an error to use threading locks in _any_ way after a fork. I think we mostly agree on this. The programs we discussing are **inherently buggy**.

We disagree on the right action when such a bug happens. I see 3 possibilities:

1) deadlock (the current behavior, if the lock was held in the parent at the time of fork)

2) continue to execute: a) as if nothing happened (the current behavior, if the lock was not held in the parent) b) throw an Exception (equivalent to a, see below)

3) crash hard.

I think both 1 and 3 are tolerable, while 2 is **completely unsafe** because the resulting behavior of the program is unexpected and unpredictable (data corruption, deletion, random actions, etc).

== Thesis end ==

I will now address Gregory's, Richard's, and Vinay's comments in view of this thesis:

1) Gregory suggests throwing an exception when the locks are used in a child. He also discusses some cases, in which he believes one could safely continue execution.

My responses:

a) Throwing an exception is tatamount to continuing execution.

Imagine that the parent has a tempfile RAII object that erases the file after the object disappears, or in some exception handler.

The destructor / handler will now get called in the child... and the parent's tempfile is gone. Good luck tracking that one down.

b) In general, is not safe to continue execution on release(). If you release() and reinitialize, the lock could still later be reused by both parent and child, and there would still be contention leading to data corruption.

c) Re: deadlocks are unacceptable...

A deadlock is better than data corruption. Whether you prefer a deadlock or a crash depends on whether your system is set up to dump core. You can always debug a deadlock with gdb. A crash without a core dump is impossible to diagnose. However, a crash is harder to ignore, and it lets the process recover. So, in my book, though I'm not 100% certain: hard crash > deadlock > corruption

d) However, we can certainly do better than today:

i) Right now, we sometimes deadlock, and sometimes continue execution. It would be better to deadlock always (or crash always), no matter how the child uses the lock.

ii) We can log before deadlocking (this is hard in general, because it's unclear where to log to), but it would immensely speed up debugging.

iii) We can hard-crash with an extra-verbose stack dump (i.e. dump the lock details in addition to the stack)

2) Richard explains how my buggy snippets are buggy, and how to fix them.

I respond: Richard, thanks for explaining how to avoid these bugs!

Nonetheless, people make bugs all the time, especially in areas like this. I made these bugs. I now know better, mostly, but I wouldn't bet on it.

We should choose the safest way to handle these bugs: deadlocking always, or crashing always. Reinitializing the locks is going to cost Python users a lot more in the long run. Deadlocking _sometimes_, as we do now, is equally bad.

Also, even your code is potentially unsafe: when you execute the excepthook in the child, you could be running custom exception logic, or even a custom excepthook. Those could well-intentionedly, but stupidly, destroy some of the parent's valuable data.

3) Vinay essentially says "using logging after fork is user error".

I respond: Yes, it is. In any other logging library, this error would only result in mangled log lines, but no lasting harm.

In Python, you sometimes get a deadlock, and other times, mangled lines.

logging is not doing anything to protect things *outside* of a single process

A file is very much outside a single process. If you are logging to a file, the only correct way is to use a file lock. Thus, I stand by my assertion that "logging" is buggy.

Windows programs generally have no problems with this. fork() on UNIX gives you both the rope and the gallows to hang yourself.

Specifically for logging, I think reasonable options include:

a) [The Right Way (TM)] Using a file lock + CLOEXEC when available; this lets multiple processes cooperate safely.

b) It's okay to deadlock & log with an explanation of why the deadlock is happening.

c) It's okay to crash with a similar explanation.

d) It's pretty okay even to reinitialize logs, although mangled log lines do prevent automated parsing.

I really hope that my compact thesis can help us get closer to a consensus, instead of arguing about the details of specific bugs.

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

A slightly more ambitious solution than crashing / deadlocking always is to have Python automatically spawn a "fork server" whenever you start using threads.

Then, you would be able to have "subprocess" work cleanly, and not worry about any of this stuff.

I don't know if we want to take the perf hit on "import threading" though...

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

Actually, we might be able to automatically spawn a safe fork server _only_ when people start mixing threading and subprocess.

I'm not totally sure if this would allow us to salvage multiprocessing as well...

The tricky bit is that we'd need to proxy into the fork server all the calls having to do with file descriptors / sockets that we would want to pass into the child processes.

That suggests to me that it'll be really hard to do this in a backwards-compatible way.

Given that subprocess is a pretty broken library, this might be a good time to replace it anyway.

gpshead commented 12 years ago

subprocess has nothing to do with this bug. subprocess is safe as of Python 3.2 (and the subprocess32 backport for 2.x). Its preexec_fn argument is already documented as an unsafe legacy. If you want to replace subprocess, go ahead, write something new and post it on pypi. That is out of the scope of this issue.

Look at the original message I opened this bug with. I *only* want to make the standard library use of locks not be a source of deadlocks as it is unacceptable for a standard library itself to force your code to adopt a threads only or a fork only programming style. How we do that is irrelevant; I merely started the discussion with one suggestion.

Third party libraries are always free to hang their users however they see fit.

If you want to "log" something before deadlocking, writing directly to the stderr file descriptor is the best that can be done. That is what exceptions that escape __del__ destructors do.

logging, http.cookiejar, _strptime - all use locks that could be dealt with in a sane manner to avoid deadlocks after forking.

Queue, concurrent.futures & threading.Condition - may not make sense to fix as these are pretty threading specific as is and should just carry the "don't fork" caveats in their documentation.

(My *real* preference would be to remove os.fork() from the standard library. Not going to happen.)

18859d0c-2856-4055-8921-39931a135f5e commented 12 years ago

1) I'm totally in favor of making the standard library safe. For that purpose, I think we should do a combination of:

a) Use file locks in logging, whenever possible.

b) Introduce LockUnsafelyReinitializedAtFork, using a generation counter, or whatever else, which can be used by the few places in the standard library that can safely deal with lock reinitialization.

2) http://docs.python.org/library/subprocess.html#module-subprocess does not actually document that preexec_fn is unsafe and in need of deprecation. New users will continue to shoot themselves in the foot.

3) I think that in addition to making the standard library safe, all other locks need to be made sane (crash or deadlock), so that we at least always avoid the option "2) continue to execute the child despite it relying on an unsafe lock".

python / cpython

Locks in the standard library should be sanitized on fork #50970