cpuset_memory_spread_testset.sh may send SIGUSR1 to cpuset_mem_hog before cpuset_mem_hog register SIGUSR1 signal.The default behavior of SIGUSR1 is killing process.In this case, cpuset_memory_spread_testset.sh may wait forever in read exit_num < $FIFO.
send sig:a :a is a hexadecimal 10, SIGUSR is 10
pid: 17374: 17314 is a hexadecimal 95092.
pid-95124 is cpuset_memory_spread_testset.sh.
pid-95092 is cpuset_mem_hog.
complete_signal groupexit log is added in kernel/signal.c::complete_signal by myself:
static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
{
...
if (sig_fatal(p, sig) &&
!(signal->flags & SIGNAL_GROUP_EXIT) &&
!sigismember(&t->real_blocked, sig) &&
(sig == SIGKILL || !p->ptrace)) {
/*
* This signal will be fatal to the whole group.
*/
if (!sig_kernel_coredump(sig)) {
/*
* Start a group exit and wake everybody up.
* This way we don't have other threads
* running and doing things after a slower
* thread has the fatal signal pending.
*/
trace_printk("gh: complete_signal groupexit, pid=%d, %d\n", p->pid, current->pid);
dump_stack();
signal->flags = SIGNAL_GROUP_EXIT;
signal->group_exit_code = sig;
signal->group_stop_count = 0;
...
}
#define sig_fatal(t, signr) \
(!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
(t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL)
Because it meets the conditions of sigfatal(), so cpuset_mem_hog is marked as SIGNALGROUP EXIT, and it will check the signal to be processed when returning to the user space next time, and then kill itself.
I had a quick look at the code and indeed there is missing synchronization between the mem_hog and the shell script. Ideally these tests should be rewritten to use checkpoints from the LTP library instead signals.
cpuset_memory_spread_testset.sh may send SIGUSR1 to cpuset_mem_hog before cpuset_mem_hog register SIGUSR1 signal.The default behavior of SIGUSR1 is killing process.In this case, cpuset_memory_spread_testset.sh may wait forever in
read exit_num < $FIFO
.Here are the trace logs:
send sig:a
:a is a hexadecimal 10, SIGUSR is 10pid: 17374
: 17314 is a hexadecimal 95092.pid-95124
is cpuset_memory_spread_testset.sh.pid-95092
is cpuset_mem_hog.complete_signal groupexit
log is added in kernel/signal.c::complete_signal by myself:Because it meets the conditions of sigfatal(), so cpuset_mem_hog is marked as SIGNALGROUP EXIT, and it will check the signal to be processed when returning to the user space next time, and then kill itself.