Open ycliang-tw opened 2 years ago
The value is arbitrary indeed and was chosen big enough. Having buf1.tms_stime = 6 just after a fork means that system spend 6 seconds doing something after forking a new process, that seems excessive. What happens is that the test library forks a new process, that process sets up signal handlers, sets process group etc. calls the test setup() and then jumps into the verify_times()
function. There are some syscalls being done between the fork and the verify_times()
function call but nothing that would be worth of seconds. The fact that it does happen only when you increase number of cores is suspicious as well. All in all it looks like there may be something wrong in the kernel.
Hi Cyril,
Thank you for the quick feedback!
Howewer, I thought the time reported are in ticks, so it probably is 60 ms? (We have CONFIG_HZ=100)
But either way, I agree 5 is a reasonably large value for the time spent between fork
and verify_times()
.
( considering times
only measures the time between fork
and verify_times
,
and there should only be the time spent on times
syscall for buf1.tms_stime
)
I will try dig deeper and see what will turn out, thanks again!
Ah, right, we have to divide the values with sysconf(_SC_CLK_TCK) to get seconds.
I guess that one thing that can be tried is to fork() a fresh process in the test just to run the verify_times()
function that way we will make sure that anything that happens in the test library is not accounted for in the test.
We have tried the following test program and the numbers of stime is almost the same with the buf1.tms_stime.
#include <sys/times.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
struct tms a1 ;
int pid = fork();
if (pid == 0) {
if (times(&a1) == -1)
puts("Error!");
printf("times:\n"
"utime: %li, stime: %li\n"
"cutime: %li, cstime: %li\n",
a1.tms_utime, a1.tms_stime,
a1.tms_cutime, a1.tms_cstime);
}
}
Sorry did not give enough context.
We are testing using an FPGA with default 60 MHz CPU frequency. We are wondering would this relatively slow platform be the reason that we get high stime?
Besides, we found that if we use different frequency in DTS, we would get different results.
cpus {
#address-cells = <1>;
#size-cells = <0>;
timebase-frequency = <60000000>; // modified to 100 000 000
CPU0: cpu@0 {
device_type = "cpu";
reg = <0>;
status = "okay";
compatible = "riscv";
mmu-type = "riscv,sv39";
clock-frequency = <60000000>; // modified to 100 000 000
CPU0_intc: interrupt-controller {
#interrupt-cells = <1>;
interrupt-controller;
compatible = "riscv,cpu-intc";
};
};
We modified times03.c to have it print the stime no matter PASS or FAIL.
(We use cpu hotplug to control the numbers of cpu, and execute times03 10 times to see which number would come up) |
# of cores | buf1.tms_stime (DTS 60 MHz) |
buf1.tms_stime (DTS 100 MHz) |
---|---|---|---|
1 | 1 | 0 | |
2 | 1, 2 | 1 | |
3 | 2, 3 | 1 | |
4 | 3, 4 | 2 | |
5 | 4 | 2, 3 | |
6 | 4, 5 | 3 | |
7 | 5, 6 | 3, 4 | |
8 | 6, 7 | 3, 4 |
We are not familiar with how cpu frequency and kernel timing-related subsystem work, so will definitely check that out.
Hi,
We are running LTP on a RISC-V platform and found that we would bump into this failure on 8-core platform.
(didn't happen on platform with core numbers less than 4) Checking the git log of commit
ed01f6a05c
, the check for>5
seems to be determined by data from execution, so we are wondering if this is reasonable to change from>5
to>10
?