syscall/times03: Is it reasonable to modify the check from `>5` to `>10`

ycliang-tw commented 2 years ago

Hi,

We are running LTP on a RISC-V platform and found that we would bump into this failure on 8-core platform.
(didn't happen on platform with core numbers less than 4) Checking the git log of commit ed01f6a05c, the check for >5 seems to be determined by data from execution, so we are wondering if this is reasonable to change from >5 to >10 ?

~ # ./times03
tst_test.c:985: INFO: Timeout per run is 0h 05m 00s
times03.c:102: PASS: buf1.tms_utime <= 5
times03.c:105: FAIL: buf1.tms_stime = 6                            // FAIL
times03.c:118: PASS: buf2.tms_utime = 118
times03.c:124: PASS: buf1.tms_utime (0) < buf2.tms_utime (118)
times03.c:131: PASS: buf2.tms_stime = 238
times03.c:137: PASS: buf1.tms_stime (6) < buf2.tms_stime (238)
times03.c:144: PASS: buf2.tms_cutime = 0
times03.c:149: PASS: buf2.tms_cstime = 0
times03.c:168: PASS: buf2.tms_utime (118) <= buf3.tms_utime (118)
times03.c:176: PASS: buf2.tms_stime (238) <= buf3.tms_stime (238)
times03.c:183: PASS: buf3.tms_cutime = 126
times03.c:188: PASS: buf3.tms_cstime = 278

Summary:
passed   11
failed   1
skipped  0
warnings 0

metan-ucw commented 2 years ago

The value is arbitrary indeed and was chosen big enough. Having buf1.tms_stime = 6 just after a fork means that system spend 6 seconds doing something after forking a new process, that seems excessive. What happens is that the test library forks a new process, that process sets up signal handlers, sets process group etc. calls the test setup() and then jumps into the verify_times() function. There are some syscalls being done between the fork and the verify_times() function call but nothing that would be worth of seconds. The fact that it does happen only when you increase number of cores is suspicious as well. All in all it looks like there may be something wrong in the kernel.

ycliang-tw commented 2 years ago

Hi Cyril,

Thank you for the quick feedback!

Howewer, I thought the time reported are in ticks, so it probably is 60 ms? (We have CONFIG_HZ=100) But either way, I agree 5 is a reasonably large value for the time spent between fork and verify_times(). ( considering times only measures the time between fork and verify_times, and there should only be the time spent on times syscall for buf1.tms_stime )

I will try dig deeper and see what will turn out, thanks again!

metan-ucw commented 2 years ago

Ah, right, we have to divide the values with sysconf(_SC_CLK_TCK) to get seconds.

I guess that one thing that can be tried is to fork() a fresh process in the test just to run the verify_times() function that way we will make sure that anything that happens in the test library is not accounted for in the test.

ycliang-tw commented 2 years ago

We have tried the following test program and the numbers of stime is almost the same with the buf1.tms_stime.

#include <sys/times.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{
        struct tms a1 ;

        int pid = fork();
        if (pid == 0) {
                if (times(&a1) == -1)
                        puts("Error!");

                printf("times:\n"
                        "utime: %li, stime: %li\n"
                        "cutime: %li, cstime: %li\n",
                        a1.tms_utime, a1.tms_stime,
                        a1.tms_cutime, a1.tms_cstime);
        }
}

Sorry did not give enough context.

We are testing using an FPGA with default 60 MHz CPU frequency. We are wondering would this relatively slow platform be the reason that we get high stime?

Besides, we found that if we use different frequency in DTS, we would get different results.

    cpus {
        #address-cells = <1>;
        #size-cells = <0>;
        timebase-frequency = <60000000>;                           // modified to 100 000 000
        CPU0: cpu@0 {
            device_type = "cpu";
            reg = <0>;
            status = "okay";
            compatible = "riscv";
            mmu-type = "riscv,sv39";
            clock-frequency = <60000000>;                        // modified to 100 000 000
            CPU0_intc: interrupt-controller {
                #interrupt-cells = <1>;
                interrupt-controller;
                compatible = "riscv,cpu-intc";
            };
        };

We modified `times03.c` to have it print the stime no matter PASS or FAIL. (We use cpu hotplug to control the numbers of cpu, and execute `times03` 10 times to see which number would come up)	# of cores	`buf1.tms_stime` (DTS 60 MHz)
1	1	0
2	1, 2	1
3	2, 3	1
4	3, 4	2
5	4	2, 3
6	4, 5	3
7	5, 6	3, 4
8	6, 7	3, 4

We are not familiar with how cpu frequency and kernel timing-related subsystem work, so will definitely check that out.

linux-test-project / ltp

syscall/times03: Is it reasonable to modify the check from `>5` to `>10` #941