linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)
https://linux-test-project.readthedocs.io/
GNU General Public License v2.0
2.28k stars 999 forks source link

cpuset_memory_spread_testset.sh: stuck in general_memory_spread_test->read exit_num #1037

Open gouhao opened 1 year ago

gouhao commented 1 year ago

cpuset_memory_spread_testset.sh may send SIGUSR1 to cpuset_mem_hog before cpuset_mem_hog register SIGUSR1 signal.The default behavior of SIGUSR1 is killing process.In this case, cpuset_memory_spread_testset.sh may wait forever in read exit_num < $FIFO.

Here are the trace logs:

           <...>-95124 [037] ....  1941.367711: sys_enter: NR 129 (17374, a, 2, ffffc725f5b3, 0, 1999999999999999)
           <...>-95124 [037] ....  1941.367711: sys_kill(pid: 17374, sig: a)
           <...>-95124 [037] d...  1941.367714: kmem_cache_alloc: call_site=ffff00000810df44 ptr=000000000f4deb68 bytes_req=160 bytes_alloc=160 gfp_flags=GFP_ATOMIC|__GFP_NOWARN
           <...>-95124 [037] d...  1941.367715: complete_signal: gh: complete_signal groupexit, pid=95092, 95124

send sig:a :a is a hexadecimal 10, SIGUSR is 10 pid: 17374: 17314 is a hexadecimal 95092. pid-95124 is cpuset_memory_spread_testset.sh. pid-95092 is cpuset_mem_hog.

complete_signal groupexit log is added in kernel/signal.c::complete_signal by myself:

static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
{
...
        if (sig_fatal(p, sig) &&
            !(signal->flags & SIGNAL_GROUP_EXIT) &&
            !sigismember(&t->real_blocked, sig) &&
            (sig == SIGKILL || !p->ptrace)) {
                /*
                 * This signal will be fatal to the whole group.
                 */
                if (!sig_kernel_coredump(sig)) {
                        /*
                         * Start a group exit and wake everybody up.
                         * This way we don't have other threads
                         * running and doing things after a slower
                         * thread has the fatal signal pending.
                         */
                        trace_printk("gh: complete_signal groupexit, pid=%d, %d\n", p->pid, current->pid);
                        dump_stack();
                        signal->flags = SIGNAL_GROUP_EXIT;
                        signal->group_exit_code = sig;
                        signal->group_stop_count = 0;
...
}

#define sig_fatal(t, signr) \
    (!siginmask(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
     (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL)

Because it meets the conditions of sigfatal(), so cpuset_mem_hog is marked as SIGNALGROUP EXIT, and it will check the signal to be processed when returning to the user space next time, and then kill itself.

metan-ucw commented 1 year ago

I had a quick look at the code and indeed there is missing synchronization between the mem_hog and the shell script. Ideally these tests should be rewritten to use checkpoints from the LTP library instead signals.