marioroy / mce-perl

Many-Core Engine for Perl
Other
45 stars 5 forks source link

Option to disable custom $SIG{...} handlers #24

Closed exodist closed 1 month ago

exodist commented 2 months ago

I am using MCE::Child for a task at work. In the process several other bits of functionality that involve fork/kill have broken. I have traced these breakages to the custom signal handlers MCE puts in place.

I can fix these issues by adding these post-fork in my custom stuff:

        local $SIG{ABRT} = undef;                                                                                            
        local $SIG{HUP}  = undef;                                                                                            
        local $SIG{CHLD} = undef;                                                                                            
        local $SIG{CLD}  = undef;                                                                                            
        local $SIG{IOT}  = undef;                                                                                            
        local $SIG{PIPE} = undef;                                                                                            
        local $SIG{QUIT} = undef;                                                                                            
        local $SIG{TERM} = undef;                 

But I am worried this may be a bad idea. I am assuming that if MCE sets the signal handlers that it must have a good reason. Why does MCE set them, what can I expect to break by unsetting them. Is it possible to add a flag or import or similar option to prevent the signal handlers from being set?

marioroy commented 2 months ago

You can try the following to set the process group for the process.

use MCE::Signal qw(-setpgrp);
use MCE::Child;

I am assuming that if MCE sets the signal handlers that it must have a good reason. Why does MCE set them, what can I expect to break by unsetting them.

On UNIX platforms, the parent process terminating a child via $child->exit will not work if undef $SIG{QUIT}. Likewise, the parent process killing a child process via $child->kill will not work if undef $SIG{INT}. Nothing in MCE sets $SIG{IOT}.

Is it possible to add a flag or import or similar option to prevent the signal handlers from being set?

Sure. But I wonder if your application will work without undef $SIG{QUIT} and $SIG{INT}. Can you try local $SIG{PIPE} = undef; only? It would be helping knowing if this is the root cause.

exodist commented 2 months ago

In one case it is the SIG{TERM} handler, in another it is the SIG{HUP} handler. I wonder what is setting SIG{IOT} cause in my testing it is undef normally and set to a coderef when MCE::Child is loaded. SIG{PIPE} is not the cause of the issue in the 2 cases I am currently working with.

exodist commented 2 months ago

Hmm, one thing both cases have in common is they follow a pattern like this:


my @children;
LOOP: for my $i (@todo) {
    my $pid = fork // die "Could not fork: $!"
    if ($pid) { push @children => $pid; next };
    eval { start_child(); 1 } or warn $@;
    POSIX::_exit(0);
}
my $start = time;
while (1) {
    sleep($arbitrary_time);
    kill('TERM', @children) if time - $start > $kill_after_duration; # In a second location this uses HUP, not TERM
    kill('KILL', @children) if time - $start > ($kill_after_duration * 2);
    waitpid($_, WNOHANG) for @children;
}

Having the MCE handlers in place seems to break out of the fork LOOP inside the child processes.

Please note, this is not an exact implementation of either case, but just a general idea of common things between them. I am not able to share the actual code from work.
marioroy commented 2 months ago

Can you try the following? Again, nothing in MCE sets the IOT handler.

use MCE::Signal;

$SIG{ABRT} = undef;                                                                                            
$SIG{HUP}  = undef;                                                                                            
$SIG{CHLD} = undef;                                                                                            
$SIG{CLD}  = undef;                                                                                            
$SIG{IOT}  = undef;                                                                                            
$SIG{PIPE} = undef;                                                                                            
$SIG{QUIT} = undef;                                                                                            
$SIG{TERM} = undef;

Use MCE::Child afterwards without further undefs. If this works, then I can add a -nosig option to MCE::Signal.

use MCE::Child;             
exodist commented 2 months ago

I do know that if I localize the right keys in %SIG in the scope where I load MCE::Child that it will preserve the signal handlers instead of letting MCE takeover. I just wanted to be sure that doing so would not break MCE functionality. Based on your first reply it sounds like it will break $child->exit and $child->kill? Who do these break without the signal handlers?

marioroy commented 2 months ago

The following works for me using your pattern.

use v5.030;
use POSIX ();
use MCE::Child;
use Time::HiRes qw(sleep time);

MCE::Child->init(
    void_context => 1,
    posix_exit   => 1,
);

sub start_child {
    my ($item) = @_;
    say "$$: $item";
    sleep 20;
}

my @todo = (1..9);

for my $item (@todo) {
    MCE::Child->create('start_child', $item);
}

my $arbitrary_time = 2.0;
my $kill_after_duration = 4.0;
my $start = time;

while (MCE::Child->pending) {
    sleep $arbitrary_time;
    for my $child (MCE::Child->list) {
        $child->join, next if $child->is_joinable;
        $child->kill('TERM') if time - $start > $kill_after_duration;
        $child->kill('KILL') if time - $start > $kill_after_duration * 2;
    }
}
exodist commented 2 months ago

yeah, the most infuriating thing was that there was a clear race condition, we could adjust sleep times and it would pass with some, fail with others. This race condition either was not present, or the race was always won in our favor regardless of sleeps when the signal handlers are not in place.

exodist commented 2 months ago

I was not able to find an obvious cause of the race, so I am unable ot give you a proper reproduction case currently.

marioroy commented 1 month ago

This is resolved in MCE 1.899 { MCE::Child and MCE::Channel::Mutex/MutexFast }. I used the following snippet on a Linux platform for testing. Try omitting the KILL signal if no longer necessary for your application. Regardless, I tested with and without the KILL signal.

use v5.030;

use POSIX ();
use MCE::Child 1.899;
use Time::HiRes qw(sleep time);

MCE::Child->init(
    void_context => 1,
    posix_exit   => 1,
);

sub start_child {
    my ($item) = @_;
    say "$$: $item";
    sleep(rand() * 2);
}

my @todo = (1..2000);

for my $item (@todo) {
    MCE::Child->create('start_child', $item);
}

my $arbitrary_time = 0.3;
my $kill_after_duration = 1;
my $start = time;
my $i = 0;

while (MCE::Child->pending) {
    sleep $arbitrary_time;
    for my $child (MCE::Child->list) {
        $child->join, next if $child->is_joinable;
        say(++$i, 'TERM'), $child->kill('TERM') if time - $start > $kill_after_duration;
      # say(++$i, 'KILL'), $child->kill('KILL') if time - $start > $kill_after_duration * 2;
    }
}
marioroy commented 1 month ago

The signal handling anomaly has been resolved with more improvements in MCE::Child 1.900. I understand the reason for sending the KILL signal previously, but not necessary going forward. A child receiving a KILL signal is not something I can safeguard during IPC transaction.

marioroy commented 1 month ago

The child_timeout option is preferred if that works for your application. Internally, it calls Perl's native alarm function. This version omits the kill statements.

use v5.030;

use POSIX ();
use MCE::Child;
use Time::HiRes qw(sleep time);

my $arbitrary_time = 0.3;
my $abort_after_duration = 1;

MCE::Child->init(
    child_timeout => $abort_after_duration,
    void_context  => 1,
    posix_exit    => 1,
);

sub start_child {
    my ($item) = @_;
    say "$$: $item";
    sleep(rand() * 2);
}

my @todo = (1..2000);

for my $item (@todo) {
    MCE::Child->create('start_child', $item);
}

my $start = time;

while (MCE::Child->pending) {
    sleep $arbitrary_time;
    $_->join for MCE::Child->list_joinable;
}