twisted / twisted

Event-driven networking engine written in Python.
https://twisted.org
Other
5.6k stars 1.17k forks source link

Reaping child processes has superlinear complexity on POSIX #2967

Open twisted-trac opened 16 years ago

twisted-trac commented 16 years ago
lukatmyshu's avatar lukatmyshu reported
Trac ID trac#2967
Type defect
Created 2007-12-27 21:52:44Z
Branch https://github.com/twisted/twisted/tree/waitpid-2967

Currently, twisted (process.py) does one waitpid for every process that it has spawned. If Twisted has a few thousand plus child processes alive still this can cause a huge slowdown (after a few hours twisted was spending 90% of its time doing waitpid).

One possible solution to this problem is to call waitpid(-1) just once to check if there are any processes that can be reaped. According to 'man waitpid', this should be called in a loop until there are no more child processes to be reaped. We're currently using a similar modification on our servers, and twisted rarely consumes more than 2% of the CPU now (previously was upwards of 20% .... would require restarts every 7-8 hours).

A problem with this solution is that it could interfere with non-Twisted process creation APIs. If waitpid(-1) reaps a process someone else created, there's nothing useful Twisted can do with that information and the other process creation API will never be able to learn about its process exiting.

Another possible solution is to use the SA_SIGINFO flag to sigaction to install a signal handler which will be informed of the PID of child which exited.

Another possibility is to use waitid(-1) with the WNOWAIT flag to learn the exited process's pid efficiently without interfering with the exit status of the process. A caveat with this approach is that if you ever see a PID you weren't expecting come back from waitid(-1), you have to fall back to calling waitpid() for each PID which Twisted knows about which might have exited.

Another possible solution is for Twisted to create one child process it controls. Whenever an attempt is made to spawn a child process, that existing child process is used to create the new child process. Since Twisted is in complete control of that intermediate process, no unknown processes can be created, so waitpid(-1) is acceptable.

Attachments:

Searchable metadata ``` trac-id__2967 2967 type__defect defect reporter__lukatmyshu lukatmyshu priority__normal normal milestone__ branch__branches_waitpid_2967 branches/waitpid-2967 branch_author__ status__new new resolution__None None component__core core keywords__ time__1198792364000000 1198792364000000 changetime__1233616243000000 1233616243000000 version__None None owner__ cc__spiv cc__exarkun ```
twisted-trac commented 16 years ago
exarkun's avatar @exarkun set owner to lukatmyshu

This is a good change to make.

The patch has some problems, though:

Additionally, it's almost certainly the case that some behavior of this code is not properly unit tested (I didn't try applying the patch and running the tests, but since the patch seems to be broken wrt multiple zombie processes, if no existing unit test fails with it applied then there is definitely some missing coverage). Can you correct the above issues and add some unit tests?

twisted-trac commented 16 years ago
therve's avatar @therve set owner to @therve
twisted-trac commented 16 years ago
glyph's avatar @glyph set owner to @glyph
@glyph set status to assigned
twisted-trac commented 15 years ago
glyph's avatar @glyph removed owner
@glyph set status to new
twisted-trac commented 13 years ago
Automation's avatar Automation removed owner
twisted-trac commented 16 years ago
exarkun's avatar @exarkun commented

Oh, I forgot, the patch also leaves dead code, _BaseProcess.reapProcess, which should be deleted.

twisted-trac commented 16 years ago
therve's avatar @therve commented

(In [22690]) Use waitpid(-1), but loop until failure, and add a test.

Refs #2967

twisted-trac commented 16 years ago
therve's avatar @therve commented

(In [22689]) Branching to 'waitpid-2967'

twisted-trac commented 16 years ago
spiv's avatar spiv commented

I haven't looked at the patch closely, but I am in favour of making this change.

Note though that using waitpid(-1) has a theoretical disadvantage to the current code: it may reap processes other than those spawned by reactor.spawnProcess (e.g. if someone is using a library that also happens to create child processes).

This is probably an acceptable tradeoff (I doubt anyone actually depends on the existing behaviour, but who knows...), but it is a potentially significant change in behaviour. So it probably deserves an explicit comment to that effect in the code, and ideally also a mention in the NEWS file whenever we next make a release.

twisted-trac commented 16 years ago
glyph's avatar @glyph commented

As spiv said, we can't use waitpid(-1) to optimize this at the same time as supporting popen() Currently we don't really support using popen() but it's a very long-standing bug to fix this (#10460); users trip over it all the time.

I do have an idea for improving performance a bit over the current strategy. The really pessimal case right now is if you spawn a zillion long-running processes, then repeatedly start and stop a few short-running processes. We could optimize this if we didn't have to actually reap all the processes all the time. The implementation strategy would have a 'reaper' object that would maintain a list of processes which it would partially iterate (using task.coiterate, say) each time a SIGCHLD was received. To deal most effectively with the pessimal case, we could sort this list by process lifetime and assume that shorter-lived processes were more likely to be reaped first. Also, by processing the list more slowly, if a large group of processes exited simultaneously, we could maintain a counter of exited processes and potentially reap more than one on a single pass through the list.

Another idea for optimizing this without breaking popen() would be to spawn a single dedicated process-spawning process which the reactor would internally issue messages to spawn processes (say, over a custom AMP protocol). The main process could pass file descriptors to sub-sub-processes via sendmsg() and SCM_RIGHTS (see man 7 unix if you are unfamiliar with this technique). That process would not be running application code and would therefore be able to rely on SIGCHLD and waitpid(-1) doing what we expect. This spawnProcess implementation could be implemented in terms of the current one, though; the reactor would run the process-spawner-process with the regular waitpid(pid)-ing spawnProcess API.

twisted-trac commented 15 years ago
exarkun's avatar @exarkun commented

The proposed fix may not be the correct one. There is a valid problem here, though. Adjusting the summary and description to reflect this.

twisted-trac commented 15 years ago
exarkun's avatar @exarkun commented

The problem with the SIGINFO solution in the description is that signals might get coalesced. That means you won't always know all the PIDs for processes which have exited, so you still have to fall back to waitpid(-1) or waitid(-1).

For what it's worth, here's a program which demonstrates signal coalescing, even when SA_NODEFER is being used:

#!C
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

void child_sigaction(int signal, siginfo_t* info, void* context) {
        printf("%d %d\n", signal, info->si_pid);
}

int main(int argc, char** argv) {
        int i;
        struct sigaction child = {0};
        FILE* fd[3];

        child.sa_sigaction = child_sigaction;
        child.sa_flags = SA_RESTART | SA_NODEFER | SA_SIGINFO;

        if (sigaction(SIGCHLD, &child, NULL) < 0) {
                perror("sigaction");
        }

        printf("%d\n", getpid());
        fd[0] = popen("sleep 1", "r");
        fd[1] = popen("sleep 2", "r");
        fd[2] = popen("sleep 3", "r");

        raise(SIGSTOP);

        for (i = 0; i < 4; ++i) {
                sleep(2);
        }

        return 0;
}

If signals are not coalesced, running the programming, waiting at least 3 seconds, then foregrounding the program would result in 3 lines worth of signal/pid information being written to stdout. Instead, only one line is written.

(also editing the descriptor to add another possible solution)