Closed exodist closed 1 month ago
You can try the following to set the process group for the process.
use MCE::Signal qw(-setpgrp);
use MCE::Child;
I am assuming that if MCE sets the signal handlers that it must have a good reason. Why does MCE set them, what can I expect to break by unsetting them.
On UNIX platforms, the parent process terminating a child via $child->exit
will not work if undef $SIG{QUIT}
. Likewise, the parent process killing a child process via $child->kill
will not work if undef $SIG{INT}
. Nothing in MCE sets $SIG{IOT}
.
Is it possible to add a flag or import or similar option to prevent the signal handlers from being set?
Sure. But I wonder if your application will work without undef $SIG{QUIT}
and $SIG{INT}
. Can you try local $SIG{PIPE} = undef;
only? It would be helping knowing if this is the root cause.
In one case it is the SIG{TERM} handler, in another it is the SIG{HUP} handler. I wonder what is setting SIG{IOT} cause in my testing it is undef normally and set to a coderef when MCE::Child is loaded. SIG{PIPE} is not the cause of the issue in the 2 cases I am currently working with.
Hmm, one thing both cases have in common is they follow a pattern like this:
my @children;
LOOP: for my $i (@todo) {
my $pid = fork // die "Could not fork: $!"
if ($pid) { push @children => $pid; next };
eval { start_child(); 1 } or warn $@;
POSIX::_exit(0);
}
my $start = time;
while (1) {
sleep($arbitrary_time);
kill('TERM', @children) if time - $start > $kill_after_duration; # In a second location this uses HUP, not TERM
kill('KILL', @children) if time - $start > ($kill_after_duration * 2);
waitpid($_, WNOHANG) for @children;
}
Having the MCE handlers in place seems to break out of the fork LOOP inside the child processes.
Please note, this is not an exact implementation of either case, but just a general idea of common things between them. I am not able to share the actual code from work.
Can you try the following? Again, nothing in MCE sets the IOT
handler.
use MCE::Signal;
$SIG{ABRT} = undef;
$SIG{HUP} = undef;
$SIG{CHLD} = undef;
$SIG{CLD} = undef;
$SIG{IOT} = undef;
$SIG{PIPE} = undef;
$SIG{QUIT} = undef;
$SIG{TERM} = undef;
Use MCE::Child afterwards without further undefs. If this works, then I can add a -nosig
option to MCE::Signal
.
use MCE::Child;
I do know that if I localize the right keys in %SIG in the scope where I load MCE::Child that it will preserve the signal handlers instead of letting MCE takeover. I just wanted to be sure that doing so would not break MCE functionality. Based on your first reply it sounds like it will break $child->exit and $child->kill? Who do these break without the signal handlers?
The following works for me using your pattern.
use v5.030;
use POSIX ();
use MCE::Child;
use Time::HiRes qw(sleep time);
MCE::Child->init(
void_context => 1,
posix_exit => 1,
);
sub start_child {
my ($item) = @_;
say "$$: $item";
sleep 20;
}
my @todo = (1..9);
for my $item (@todo) {
MCE::Child->create('start_child', $item);
}
my $arbitrary_time = 2.0;
my $kill_after_duration = 4.0;
my $start = time;
while (MCE::Child->pending) {
sleep $arbitrary_time;
for my $child (MCE::Child->list) {
$child->join, next if $child->is_joinable;
$child->kill('TERM') if time - $start > $kill_after_duration;
$child->kill('KILL') if time - $start > $kill_after_duration * 2;
}
}
yeah, the most infuriating thing was that there was a clear race condition, we could adjust sleep times and it would pass with some, fail with others. This race condition either was not present, or the race was always won in our favor regardless of sleeps when the signal handlers are not in place.
I was not able to find an obvious cause of the race, so I am unable ot give you a proper reproduction case currently.
This is resolved in MCE 1.899 { MCE::Child and MCE::Channel::Mutex/MutexFast }. I used the following snippet on a Linux platform for testing. Try omitting the KILL
signal if no longer necessary for your application. Regardless, I tested with and without the KILL
signal.
use v5.030;
use POSIX ();
use MCE::Child 1.899;
use Time::HiRes qw(sleep time);
MCE::Child->init(
void_context => 1,
posix_exit => 1,
);
sub start_child {
my ($item) = @_;
say "$$: $item";
sleep(rand() * 2);
}
my @todo = (1..2000);
for my $item (@todo) {
MCE::Child->create('start_child', $item);
}
my $arbitrary_time = 0.3;
my $kill_after_duration = 1;
my $start = time;
my $i = 0;
while (MCE::Child->pending) {
sleep $arbitrary_time;
for my $child (MCE::Child->list) {
$child->join, next if $child->is_joinable;
say(++$i, 'TERM'), $child->kill('TERM') if time - $start > $kill_after_duration;
# say(++$i, 'KILL'), $child->kill('KILL') if time - $start > $kill_after_duration * 2;
}
}
The signal handling anomaly has been resolved with more improvements in MCE::Child 1.900. I understand the reason for sending the KILL
signal previously, but not necessary going forward. A child receiving a KILL
signal is not something I can safeguard during IPC transaction.
The child_timeout
option is preferred if that works for your application. Internally, it calls Perl's native alarm function. This version omits the kill statements.
use v5.030;
use POSIX ();
use MCE::Child;
use Time::HiRes qw(sleep time);
my $arbitrary_time = 0.3;
my $abort_after_duration = 1;
MCE::Child->init(
child_timeout => $abort_after_duration,
void_context => 1,
posix_exit => 1,
);
sub start_child {
my ($item) = @_;
say "$$: $item";
sleep(rand() * 2);
}
my @todo = (1..2000);
for my $item (@todo) {
MCE::Child->create('start_child', $item);
}
my $start = time;
while (MCE::Child->pending) {
sleep $arbitrary_time;
$_->join for MCE::Child->list_joinable;
}
I am using MCE::Child for a task at work. In the process several other bits of functionality that involve fork/kill have broken. I have traced these breakages to the custom signal handlers MCE puts in place.
I can fix these issues by adding these post-fork in my custom stuff:
But I am worried this may be a bad idea. I am assuming that if MCE sets the signal handlers that it must have a good reason. Why does MCE set them, what can I expect to break by unsetting them. Is it possible to add a flag or import or similar option to prevent the signal handlers from being set?