Locks in the standard library should be sanitized on fork

gpshead commented 15 years ago

BPO	6721
Nosy	@rhettinger, @gpshead, @vsajip, @jcea, @nirs, @pitrou, @vstinner, @applio, @cagney, @Birne94, @ochedru, @kevans91, @jessefarnham, @rojer, @koubaa
PRs	python/cpython#4071 python/cpython#9291 python/cpython#21986 python/cpython#22205 python/cpython#22651
Files	lock_fork_thread_deadlock_demo.py forklocktests.patch reinit_locks.diff: patch adding locks reinitialization upon fork emit_warning_on_fork.patch atfork.patch reinit_locks_2.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-feature', 'library'] title = 'Locks in the standard library should be sanitized on fork' updated_at = user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = actor = 'kevans' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'gregory.p.smith' dependencies = [] files = ['14740', '21874', '22005', '22525', '24303', '25776'] hgrepos = [] issue_num = 6721 keywords = ['patch'] message_count = 133.0 messages = ['91674', '91936', '92766', '94102', '94115', '94133', '94135', '128282', '128307', '128311', '128316', '128369', '135012', '135067', '135069', '135079', '135083', '135095', '135096', '135143', '135157', '135173', '135543', '135857', '135866', '135897', '135899', '135948', '135965', '135984', '136003', '136039', '136045', '136047', '136120', '136147', '139084', '139245', '139470', '139474', '139480', '139485', '139488', '139489', '139509', '139511', '139521', '139522', '139584', '139599', '139608', '139800', '139808', '139850', '139852', '139858', '139869', '139897', '139929', '140215', '140402', '140550', '140658', '140659', '140668', '140689', '140690', '140691', '141286', '143174', '143274', '143279', '151168', '151266', '151267', '151845', '151846', '151853', '161019', '161029', '161389', '161405', '161470', '161953', '162019', '162031', '162034', '162036', '162038', '162039', '162040', '162041', '162053', '162054', '162063', '162113', '162114', '162115', '162117', '162120', '162137', '162160', '270015', '270017', '270018', '270019', '270020', '270021', '270022', '270023', '270028', '289716', '294726', '294834', '304714', '304716', '304722', '304723', '314983', '325326', '327267', '329474', '339369', '339371', '339393', '339418', '339454', '339458', '339473', '365169', '367528', '367702', '368882'] nosy_count = 29.0 nosy_names = ['rhettinger', 'gregory.p.smith', 'vinay.sajip', 'jcea', 'nirs', 'pitrou', 'vstinner', 'nirai', 'forest_atq', 'ionelmc', 'bobbyi', 'neologix', 'Giovanni.Bajo', 'sdaoden', 'tshepang', 'sbt', 'lesha', 'dan.oreilly', 'davin', 'Connor.Wolf', 'Winterflower', 'cagney', 'Birne94', 'ochedru', 'kevans', 'jesse.farnham', 'hugh', 'rojer', 'koubaa'] pr_nums = ['4071', '9291', '21986', '22205', '22651'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue6721' versions = ['Python 3.7'] ```

vsajip commented 12 years ago

Use file locks in logging, whenever possible.

Logging doesn't just log to files, and moreover, also has locks to serialise access to internal data structures (nothing to do with files). Hence, using file locks in logging is not going to magically solve problems caused in threading+forking scenarios.

Apart from logging a commonly used part of the stdlib library which uses locks, I don't think this issue is to do with logging, specifically; logging uses locks in an unexceptional, conventional way, much as any other code might. Whatever solution is come up with for this thorny issue, it needs to be generic, in my view; otherwise we might just be papering over the cracks.

e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 12 years ago

Lesha, the problems about "magical" __del__ methods you are worried about actually have nothing to do with threading and locks. Even in a single threaded program using fork, exactly the same issues of potential corruption would be present because the object might be finalized at the same in multiple processes.

The idea that protecting the object with a thread lock will help you is seriously misguided UNLESS you also make sure you acquire them all before the fork -- and then you need to worry about the order in which you acquire all these locks. There are much easier and more direct ways to deal with the issue than wrapping all objects with locks and trying to acquire them all before forking.

You could of course use multiprocessing.Lock() if you want a lock shared between processes. But even then, depending on what the __del__ method does, it is likely that you will not want the object to be finalized in both processes.

However, the suggestion that locked-before-fork-locks should by default raise an error is reasonable enough.

a0587ef8-d338-4029-ad79-be08a4182aaa commented 8 years ago

Is anything happening with these fixes? This is still an issue (I'm running into it now)?

gpshead commented 8 years ago

For me the momentum on fixing these things has stalled because I no longer work on any code that runs into this. There is a fundamental problem: You cannot safely use threading and os.fork() in the same application per POSIX rules. So even if the standard library and interpreter to tried to force its locks into some sort of consistent state post os.fork(), the much more fundamental POSIX problem remains.

IMNSHO, working on "fixes" for this issue while ignoring the larger application design flaw elephant in the room doesn't make a lot of sense.

a0587ef8-d338-4029-ad79-be08a4182aaa commented 8 years ago

IMNSHO, working on "fixes" for this issue while ignoring the larger application design flaw elephant in the room doesn't make a lot of sense.

I understand the desire for a canonically "correct" fix, but it seems the issue with fixing it "correctly" has lead to the /actual/ implementation being broken for at least 6 years now.

As it is, my options are: A. Rewrite the many, many libraries I use that internally spawn threads. B. Not use multiprocessing.

(A) is prohibitive from a time perspective (I don't even know how many libraries I'd have to rewrite!), and (B) means I'd get 1/24-th of my VMs performance, so it's somewhat prohibitive.

At the moment, I've thrown together a horrible, horrible fix where I reach into the logging library (which is where I'm seeing deadlocks), and manually iterate over all attached log managers, resetting the locks in each immediately when each process spawns. This is, I think it can be agreed, a horrible, horrible hack, but in my particular case it works (the worst case result is garbled console output for a line or two).

---

If a canonical fix is not possible, at least add a facility to the threading fork() call that lets the user decide what to do. In my case, my program is wedging in the logging system, and I am entirely OK with having transiently garbled logs, if it means I don't wind up deadlocking and having to force kill the interpreter (which is, I think, far /more/ destructive an action).

If I could basically do multiprocessing.Process(*args, *kwargs, _clear_locks=True), that would be entirely sufficient, and not change existing behaviour at all.

a0587ef8-d338-4029-ad79-be08a4182aaa commented 8 years ago

Arrrgh, s/threading/multiprocessing/g in my last message.

18859d0c-2856-4055-8921-39931a135f5e commented 8 years ago

I work on a multi-million-line C++ codebase that uses fork() from multithreaded programs all over the place. We use glog ubiquitously.

This bug here that spans 6 years and has affected dozens of people (conservatively) simply does not exist for us. That is because glog takes the pragmatic approach of sanitizing its mutex on fork:

https://github.com/google/glog/blob/4d391fe692ae6b9e0105f473945c415a3ce5a401/src/base/mutex.h#L249

In my opinion, "thou shalt never fork() in a threaded program" is impractical purism. It is good to be aware of the dangers that lie therein, but it is completely possible to safely spawn **subprocesses** from multithreaded programs on modern OSes like Linux.

Python's subprocess ought to be safe to use in threaded programs. Any issues with this (aside from pre_exec_fn, obviously) are bugs in subprocess.

Here is a C++ implementation of the concept that can be safely used in threaded programs:

https://github.com/facebook/folly/blob/master/folly/Subprocess.cpp

Note that unlike Python's subprocess pre_exec_fn, the C++ analog is very loud in warning you about the scary world in which your closure will execute:

https://github.com/facebook/folly/blob/master/folly/Subprocess.h#L252

The point to my message is simple: there is a pragmatic way to save hundreds of debugging hours for users of Python. People are going to find it necessary to do such things from time to time, so instead of telling them to redesign their programs, it is better to give them a safer tool.

Taking the glog approach in logging has no cost to the standard library, but it does have real world benefits.

Please don't block shipping such a fix.

18859d0c-2856-4055-8921-39931a135f5e commented 8 years ago

On a moment's reflection, a lot of my prior comment is wrong. Sorry about that.

glog does not, that I know of, sanitize locks on fork. You just shouldn't log after fork but before exec.
Using pthread_atfork to clean up the logging lock might be enough to make it safe from the "just forked" context, but without adding fairly exhaustive tests around this logic, it would be fragile with respect to further improvements to logging. So even just making this one library safe is a considerable amount of work.

So I retract most of my previous opinion. Sorry.

gpshead commented 8 years ago

My intent is not to block anything. I'm Just explaining why I'm not motivated to spend much time on this issue myself. Others are welcome to.

subprocess is not related to this issue, it has been fixed for use with threads (in 3.2 and higher) with an extremely widely used drop in replacement back-port for 2.7 https://pypi.python.org/pypi/subprocess32. But even 2.7's poor subprocess implementation never triggered this specific issue in the first place (unless someone logged from a pre_exec_fn which would be a laughable thing to do anyways).

multiprocessing: It has spawn (as of 3.4) and forkserver methods both of which can help avoid this issue. Caveats: spawn understandably has negative performance implications and forkserver requires the forkserver to be forked before threads potentially holding locks have been started.

As for the gross hacky monkey patching workaround: That was the approach I took in https://github.com/google/python-atfork/blob/master/atfork/stdlib_fixer.py#L51

Definitely a hack, but one that does work on existing interpreters.

Conner & lesha: Which Python version(s) are you using?

a0587ef8-d338-4029-ad79-be08a4182aaa commented 8 years ago

Python 3.5.1+ (default, Mar 30 2016, 22:46:26)

Whatever the stock 3.5 on ubuntu 16.04 x64 is.

I've actually been running into a whole horde of really bizarre issues related to what I /think/ is locking in stdout.

Basically, I have a context where I have thousands and thousands of (relatively short lived) multiprocessing.Process() processes, and over time they all get wedged (basically, I have ~4-32 processes alive at any time, but they all get recycled every few minutes).

After doing some horrible (https://github.com/fake-name/ReadableWebProxy/blob/master/logSetup.py#L21-L78) hackery in the logging module, I'm not seeing processes get wedged there, but I do still encounter issues with what I can only assume is a lock in the print statement. I'm hooking into a wedged process using pystuck

durr@rwpscrape:/media/Storage/Scripts/ReadableWebProxy⟫ pystuck --port 6675 Welcome to the pystuck interactive shell. Use the 'modules' dictionary to access remote modules (like 'os', or 'main') Use the %show threads magic to display all thread stack traces.

In [1]: show threads \<_MainThread(MainThread, started 140574012434176)> File "runScrape.py", line 74, in \<module> go() File "runScrape.py", line 57, in go runner.run() File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 453, in run living = sum([manager.check_run_jobs() for manager in managers]) File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 453, in \<listcomp> living = sum([manager.check_run_jobs() for manager in managers]) File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 364, in check_run_jobs proc.start() File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/usr/lib/python3.5/multiprocessing/context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen return Popen(process_obj) File "/usr/lib/python3.5/multiprocessing/popenfork.py", line 20, in \_init self._launch(process_obj) File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 74, in _launch code = process_obj._bootstrap() File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(self._args, *\self._kwargs) File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 145, in run run.go() File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Runner.py", line 101, in go self.log.info("RunInstance starting!") File "/usr/lib/python3.5/logging/init.py", line 1279, in info self._log(INFO, msg, args, **kwargs) File "/usr/lib/python3.5/logging/init.py", line 1415, in _log self.handle(record) File "/usr/lib/python3.5/logging/init.py", line 1425, in handle self.callHandlers(record) File "/usr/lib/python3.5/logging/init.py", line 1487, in callHandlers hdlr.handle(record) File "/usr/lib/python3.5/logging/init__.py", line 855, in handle self.emit(record) File "/media/Storage/Scripts/ReadableWebProxy/logSetup.py", line 134, in emit print(outstr)

\<Thread(Thread-4, started daemon 140573656733440)> File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/usr/lib/python3.5/threading.py", line 862, in run self._target(self._args, *\self._kwargs) File "/usr/local/lib/python3.5/dist-packages/rpyc/utils/server.py", line 241, in start self.accept() File "/usr/local/lib/python3.5/dist-packages/rpyc/utils/server.py", line 128, in accept sock, addrinfo = self.listener.accept() File "/usr/lib/python3.5/socket.py", line 195, in accept fd, addr = self._accept()

\<Thread(Thread-5, started daemon 140573665126144)> File "/usr/local/lib/python3.5/dist-packages/pystuck/thread_probe.py", line 15, in thread_framegenerator yield (thread, frame)

So, somehow the print() statement is blocking, which I have /no/ idea how to go about debugging. I assume there's a lock /in/ the print statement function call, and I'm probably going to look into wrapping both the print() call and the multiprocessing.Process() call execution in a single, shared multiprocessing lock, but that seems like a very patchwork solution to something that should just work.

vstinner commented 8 years ago

I suggest to close the issue as WONT FIX. Python code base is huge and Python depends on a lot of external code. We cannot warranty anything.

It might be possible to track all kinds of locks with an infinite time. But I'm not sure that it's worth it.

It is possible to use fork() with threads. The problem is more to execute non trivial code after the fork. In short, the POSIX advices to only call exec() syscall after fork and nothing else. The list of functions which are safe after fork() is very short.

You can still use the multiprocessing module using the fork server for example.

9f28bdf4-bf92-4260-a4d9-cfd209ca8d2f commented 7 years ago

Currently using Python 3.6.0 (default, Mar 4 2017, 12:32:34) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin

So, somehow the print() statement is blocking, which I have /no/ idea how to go about debugging. I assume there's a lock /in/ the print statement function call, and I'm probably going to look into wrapping both the print() call and the multiprocessing.Process() call execution in a single, shared multiprocessing lock, but that seems like a very patchwork solution to something that should just work.

I am currently having a similar issue where I get a deadlock to a stdout.flush call (which I assume is called when printing). Flush internally appears to acquire a lock which is getting stuck in the subprocess.

This is the backtrace I was able to obtain through lldb, showing the waiting for a lock after the flush-call:

thread #1: tid = 0x77c27d, 0x00007fffe33c9c86 libsystem_kernel.dylib__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fffe33c9c86 libsystem_kernel.dylib__psynch_cvwait + 10 frame #1: 0x00007fffe34b396a libsystem_pthread.dylib_pthread_cond_wait + 712 frame #2: 0x00000001021ecad8 PythonPyThread_acquire_lock_timed + 256 frame #3: 0x000000010221cc2f Python_enter_buffered_busy + 169 frame #4: 0x000000010221ed36 Python_io_BufferedWriter_write + 203 frame #5: 0x000000010215448b Python_PyCFunction_FastCallDict + 529 frame #6: 0x000000010211b3f0 Python_PyObject_FastCallDict + 237 frame #7: 0x000000010211be9e PythonPyObject_CallMethodObjArgs + 240 frame #8: 0x000000010222171a Python_textiowrapper_writeflush + 150
- frame #9: 0x00000001022224a8 Python`_io_TextIOWrapper_flush + 239

Is there any update on this issue or any solution to avoid deadlocking without wrapping every fork/print/logging call with a multiprocessing (or billiard in my case) lock?

gpshead commented 7 years ago

http://bugs.python.org/issue16500 added the os.register_at_fork() API which may be usable for this.

9f28bdf4-bf92-4260-a4d9-cfd209ca8d2f commented 7 years ago

While having to deal with this bug for a while I have written a small library using pthread_atfork: https://github.com/Birne94/python-atfork-lock-release

It allows registering atfork-hooks (similar to the ones available by now) and frees the stdout/stderr as well as manually provided io locks. I guess it uses some hacky ways to get the job done, but resolved the issue for me and has been working without problems for some weeks now.

pitrou commented 7 years ago

I think we should somehow move forward on this, at least for logging locks which can be quite an annoyance.

There are two possible approaches:

either a generic mechanism as posted by sbt in reinit_locks_2.diff
or a logging-specific fix using os.register_at_fork()

What do you think?

pitrou commented 7 years ago

Oh, I forgot that IO buffered objects also have a lock. So we would have to special-case those as well, unless we take the generic approach...

A problem with the generic approach is that it would leave higher-level synchronization objects such as RLock, Event etc. in an inconsistent state. Not to mention the case where the lock is taken by the thread calling fork()...

gpshead commented 7 years ago

logging is pretty easy to deal with so I created a PR.

bufferedio.c is a little more work as we either need to use the posixmodule.c os.register_at_fork API or expose it as an internal C API to be able to call it to add acquires and releases around the buffer's self->lock member when non-NULL. either way, that needs to be written safely so that it doesn't crash if fork happens after a buffered io struct is freed. (unregister at fork handlers when freeing it? messy)

pitrou commented 7 years ago

Actually, we already have a doubly-linked list of buffered IO objects (for another purpose), so we can reuse that and register a single set of global callbacks.

7279b7c1-06c8-4d9a-9cf3-0d7739acb72d commented 6 years ago

FWIW, I encountered the same kind of issue when using the mkstemp() function: under the hood, it calls gettempdir() and this one is protected by a lock too.

Current thread 0x00007ff10231f700 (most recent call first): File "/usr/lib/python3.5/tempfile.py", line 432 in gettempdir File "/usr/lib/python3.5/tempfile.py", line 269 in _sanitize_params File "/usr/lib/python3.5/tempfile.py", line 474 in mkstemp

gpshead commented 6 years ago

New changeset 19003841e965bbf56fd06824d6093620c1b66f9e by Gregory P. Smith in branch 'master': bpo-6721: Hold logging locks across fork() (GH-4071) https://github.com/python/cpython/commit/19003841e965bbf56fd06824d6093620c1b66f9e

gpshead commented 6 years ago

New changeset 3b699932e5ac3e76031bbb6d700fbea07492641d by Gregory P. Smith (Miss Islington (bot)) in branch '3.7': bpo-6721: Hold logging locks across fork() (GH-4071) (bpo-9291) https://github.com/python/cpython/commit/3b699932e5ac3e76031bbb6d700fbea07492641d

vstinner commented 6 years ago

New changeset 3b699932e5ac3e76031bbb6d700fbea07492641d by Gregory P. Smith (Miss Islington (bot)) in branch '3.7': bpo-6721: Hold logging locks across fork() (GH-4071) (bpo-9291)

It seems like this change caused a regression in the Anaconda installer of Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1644936

But we are not sure at this point. I have to investigate to understand exactly what is happening.

5ce72146-384d-40c0-9c1f-3ebc748d5463 commented 5 years ago

I suspect 3b699932e5ac3e7 is causing a hang in libreswan's kvmrunner.py on Fedora.

Looking at the Fedora RPMs:

python3-3.7.0-9.fc29.x86_64 didn't contain the fix and worked python3-3.7.1-4.fc29.x86_64 reverted the fix (for anaconda) and worked python3-3.7.2-4.fc29.x86_64 included the fix; eventually hangs

I believe the hang looks like:

Traceback (most recent call last):
  File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 389, in _process_test
    test_domains = _boot_test_domains(logger, test, domain_prefix, boot_executor)
  File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 203, in _boot_test_domains
    TestDomain.boot_and_login)
  File "/home/build/libreswan-web/master/testing/utils/fab/runner.py", line 150, in submit_job_for_domain
    logger.debug("scheduled %s on %s", job, domain)
  File "/usr/lib64/python3.7/logging/__init__.py", line 1724, in debug

  File "/usr/lib64/python3.7/logging/__init__.py", line 1768, in log
    def __repr__(self):
  File "/usr/lib64/python3.7/logging/__init__.py", line 1449, in log
    """
  File "/usr/lib64/python3.7/logging/__init__.py", line 1519, in _log
    break
  File "/usr/lib64/python3.7/logging/__init__.py", line 1529, in handle
    logger hierarchy. If no handler was found, output a one-off error
  File "/usr/lib64/python3.7/logging/__init__.py", line 1591, in callHandlers

  File "/usr/lib64/python3.7/logging/__init__.py", line 905, in handle
    try:
  File "/home/build/libreswan-web/master/testing/utils/fab/logutil.py", line 163, in emit
    stream_handler.emit(record)
  File "/usr/lib64/python3.7/logging/__init__.py", line 1038, in emit
    Handler.__init__(self)
  File "/usr/lib64/python3.7/logging/__init__.py", line 1015, in flush
    name += ' '
  File "/usr/lib64/python3.7/logging/__init__.py", line 854, in acquire
    self.emit(record)
KeyboardInterrupt

gpshead commented 5 years ago

We need a small test case that can reproduce your problem. I believe https://github.com/python/cpython/commit/3b699932e5ac3e76031bbb6d700fbea07492641d to be correct.

acquiring locks before fork in the thread doing the forking and releasing them afterwards is always the safe thing to do.

Example possibility: Does your code use any C code that forks on its own without properly calling the C Python PyOS_BeforeFork(), PyOS_AfterFork_Parent(), and PyOS_AfterFork_Child() APIs?

5ce72146-384d-40c0-9c1f-3ebc748d5463 commented 5 years ago

Does your code use any C code that forks on its own without properly calling the C Python PyOS_BeforeFork(), PyOS_AfterFork_Parent(), and PyOS_AfterFork_Child() APIs?

No.

Is there a web page explaining how to pull a python backtrace from all the threads running within a daemon?

gpshead commented 5 years ago

I'd start with faulthandler.register with all_threads=True and see if that gives you what you need.

https://docs.python.org/3/library/faulthandler.html

5ce72146-384d-40c0-9c1f-3ebc748d5463 commented 5 years ago

acquiring locks before fork in the thread doing the forking and releasing them afterwards is always the safe thing to do.

It's also an easy way to cause a deadlock:

register_at_fork() et.al. will cause per-logger locks to be acquired before the global lock (this isn't immediately obvious from the code)

If a thread were to grab its logging lock before the global lock then it would deadlock. I'm guessing this isn't allowed - however I didn't see any comments to this effect?

Can I suggest documenting this, and also merging the two callbacks so that the ordering of these two acquires is made explicit.

the per-logger locks are acquired in a random order

If a thread were to acquire two per-logger locks in a different order then things would deadlock.

5ce72146-384d-40c0-9c1f-3ebc748d5463 commented 5 years ago

Below is a backtrace from the deadlock.

It happens because the logging code is trying to acquire two per-logger locks; and in an order different to the random order used by the fork() handler.

The code in question has a custom class DebugHandler(logging.Handler). The default logging.Handler.handle() method grabs its lock and calls .emit() vis:

        if rv:
            self.acquire()
            try:
                self.emit(record)
            finally:
                self.release()

the custom .emit() then sends the record to a sub-logger stream vis:

    def emit(self, record):
        for stream_handler in self.stream_handlers:
            stream_handler.emit(record)
        if _DEBUG_STREAM:
            _DEBUG_STREAM.emit(record)

and one of these emit() functions calls flush() which tries to acquire a further lock.

Thread 0x00007f976b7fe700 (most recent call first): File "/usr/lib64/python3.7/logging/init.py", line 854 in acquire File "/usr/lib64/python3.7/logging/init.py", line 1015 in flush

    def flush(self):
        """
        Flushes the stream.
        """
        self.acquire() <

    try:
        if self.stream and hasattr(self.stream, "flush"):
            self.stream.flush()
    finally:
        self.release()

File "/usr/lib64/python3.7/logging/init.py", line 1038 in emit

        self.flush() <\----

File "/home/build/libreswan-web/master/testing/utils/fab/logutil.py", line 163 in emit

    def emit(self, record):
        for stream_handler in self.stream_handlers:
            stream_handler.emit(record) <---
        if _DEBUG_STREAM:
            _DEBUG_STREAM.emit(record)

File "/usr/lib64/python3.7/logging/init.py", line 905 in handle

    def handle(self, record):
        """
        Conditionally emit the specified logging record.

    Emission depends on filters which may have been added to the handler.
    Wrap the actual emission of the record with acquisition/release of
    the I/O thread lock. Returns whether the filter passed the record for
    emission.
    """
    rv = self.filter(record)
    if rv:
        self.acquire()
        try:
            self.emit(record) <\---
        finally:
            self.release()
    return rv

File "/usr/lib64/python3.7/logging/init.py", line 1591 in callHandlers

                    hdlr.handle(record)

File "/usr/lib64/python3.7/logging/init.py", line 1529 in handle

            self.callHandlers(record)

File "/usr/lib64/python3.7/logging/init.py", line 1519 in _log

        self.handle(record)

File "/usr/lib64/python3.7/logging/init.py", line 1449 in log

        self._log(level, msg, args, **kwargs)

File "/usr/lib64/python3.7/logging/init.py", line 1768 in log

            self.logger.log(level, msg, *args, **kwargs)

File "/usr/lib64/python3.7/logging/init.py", line 1724 in debug

        self.log(DEBUG, msg, *args, **kwargs)

File "/home/build/libreswan-web/master/testing/utils/fab/shell.py", line 110 in write

        self.logger.debug(self.message, ascii(text))

gpshead commented 5 years ago

Thanks for the debugging details! I've filed https://bugs.python.org/issue36533 to specifically track this potential regression in the 3.7 stable branch. lets carry on there where the discussion thread isn't too long for bug tracker sanity.

vstinner commented 4 years ago

I created bpo-40089: Add _at_fork_reinit() method to locks.

pitrou commented 4 years ago

Related issue: https://bugs.python.org/issue40399 """ IO streams locking can be broken after fork() with threads """

81533846-39fd-482e-8007-54ab2c3d1e8b commented 4 years ago

https://bugs.python.org/issue40442 is a fresh instance of this, entirely self-inflicted.

vstinner commented 4 years ago

See also bpo-25920: PyOS_AfterFork should reset socketmodule's lock.

vstinner commented 2 years ago

While it's true that "Locks in the standard library should be sanitized on fork", IMO having such "meta-issue" to track the issue in the 300+ stdlib modules is a bad idea, since it's hard to track how many modules got fixed and how many modules should still be fixed. Multiple modules have been fixed. I suggest to open more specific issues for remaining ones. I close the issue. Thanks for anyone who was involved in fixing issues! Good luck for people volunteers to fix remaining issues :-) Also, avoid fork without exec, it's no longer supported on macOS, it was never supported on Windows, and it causes tons of very complex bugs on Linux :-)

python / cpython

Locks in the standard library should be sanitized on fork #50970