Open joelagnel opened 6 years ago
Don't know the reason, but you can check the source code to see whether cfs_rq_partial
matches your kernel source or not. If it does not, then wrong results will be printed.
Thanks for the reply. So I'm running kernel 4.14-rc5. cfs_rq
and sched_entity
both don't have runnable_weight
. However, check_runnable_weight_field()
still returns True
. Forcing it to return False
doesn't change anything either and I still see runqlen
is 0 as before. Seems the detection mechanism is not working correctly, or its some other issue.
I tried on 4.14-rc5. I checked that check_runnable_weight_field()
does return false so the implementation here is correct.
To make runqlen() useful, you may need to run it in a busy system:
(1). runqlen()
does not sampling, 99 times per second.
(2). runqlen()
checks the CURRENT
task running queue length. If the system is not overloaded, the kernel will be able to scheduce "runqlen.py" process in a cpu without competition and you will see a length of 0.
Thanks for trying it out. I am not fully sure why I was seeing this, for now I am developing on 4.15 kernel and its showing expected results, if I see it on our older product kernels, I will report/fix it. Also the system was overloaded when I ran into the issue, I was running make in bcc with multiple threads on a 4-core system and it showed only 0 as the run queue length. I will close this for now and reopen if needed.
Tried again on 4.14-rc5. The same workload, although light, produced non-zero runqlen on 4.15-rc7, but zero runqlen on 4.14. So it does look suspicious.
Can we not use rq->nr_running
? It has the following benefits:
It should be simpler to use (unlike cfs_rq
) since every kernel version I checked has nr_running
followed after rq->lock
:
struct rq {
/* runqueue lock: */
raw_spinlock_t lock;
/*
* nr_running and cpu_load should be in the same cacheline because
* remote CPUs use both these fields when doing load calculation.
*/
unsigned int nr_running;
Now I know what is the issue with my previous 4.14 suspicious experiments. It is due to randomized task_struct structure in 4.14 for not in bpf program. The following hack can solve the issue:
diff --git a/tools/runqlen.py b/tools/runqlen.py
index e8430ca..5559297 100755
--- a/tools/runqlen.py
+++ b/tools/runqlen.py
@@ -79,6 +79,8 @@ frequency = 99
def check_runnable_weight_field():
# Define the bpf program for checking purpose
bpf_check_text = """
+#define randomized_struct_fields_start struct {
+#define randomized_struct_fields_end };
#include <linux/sched.h>
unsigned long dummy(struct sched_entity *entity)
{
@@ -108,6 +110,8 @@ unsigned long dummy(struct sched_entity *entity)
dup(old_stderr)
close(old_stderr)
+ print(success_compile)
+
# remove the temporary file and return
unlink(tmp_file.name)
return success_compile
@@ -116,6 +120,8 @@ unsigned long dummy(struct sched_entity *entity)
# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
+#define randomized_struct_fields_start struct {
+#define randomized_struct_fields_end };
#include <linux/sched.h>
// Declare enough of cfs_rq to find nr_running, since we can't #import the
Now the result becomes consistent with 4.15.
Regarding to whether we should change the examination point for nr_running. I assume you refer to kernel/sched/sched.h
:
/* CFS-related fields in a runqueue */
struct cfs_rq {
......
#ifdef CONFIG_FAIR_GROUP_SCHED
struct rq *rq; /* cpu runqueue to which this cfs_rq is attached */
......
}
Let me discuss with some scheduler experts to see which is the better sampling place. Thanks for bringing up the suggestions!
rq->nr_running is more what you want, since cgroups make cfs_rq->nr_running a lot more interesting.
Something seems a bit off to me with the runqlen tool. I tested this on a 44 core x86 machine, the runqlen is always 0.
Tried it on a 4-core machine as well with make -j8 running on bcc. I get a similar result. This seems odd to me, higher order runqlen don't appear. Any thoughts on why this is so?