iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.61k stars 3.89k forks source link

se.cfs_rq may return group cfs_rq, not the top cfs_rq #3093

Open chengmingzhou opened 4 years ago

chengmingzhou commented 4 years ago

https://github.com/iovisor/bcc/blob/e42ac4176998a6dcf0dbf3b6befeaad0a69cb98a/tools/runqlen.py#L149

When task group used, the se.cfs_rq returned is the group cfs_rq, not the cpu top cfs_rq, so the runqlen is just the number of tasks of that cgroup on the cpu. Is there a way to get runqlen of the top cfs_rq?

Thanks.

yonghong-song commented 4 years ago

Not 100% sure, did task->se.my_q or task->se.parent->cfs_rq work in this task_group case? This could become complicated for nested cgroup. I will try to ask some expert advise.

rikvanriel commented 4 years ago

You would have to look at the root cfs_rq for the CPU the task is on.

However, that will get you a list of runnable cgroup entities, not a list of runnable tasks on the CPU, so it may still not be what you want.

chengmingzhou commented 4 years ago

@yonghong-song Thanks for your advice, I think a loop of task->se.parent until NULL should get the top entity, then that se.my_q should give me the top cfs_rq. This top cfs_rq h_nr_running should be the number of CFS tasks on that CPU ?

chengmingzhou commented 4 years ago

@rikvanriel Yes, it's still hard to get a list of runnable tasks on the cpu, it maybe enough now for me to get top cfs_rq h_nr_running. Thanks.

rikvanriel commented 4 years ago

The load balancing code has the same problem you have.

It got fixed there by adding a list of runnable tasks in the CPU runqueue.

You can traverse the cpu_rq->cfs_tasks list to get all the CFS tasks on a CPU.

Of course, this does not get you any info on real time tasks, etc...

xiejingf commented 3 years ago

@chengmingzhou can we use rq of the current CPU, rq->nr_running shoule be what we want ?

get rq through cfs_rq, C code like below:

 #ifdef CONFIG_FAIR_GROUP_SCHED

 /* cpu runqueue to which this cfs_rq is attached */
 static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
 {
     return cfs_rq->rq;
 }

 #else
 static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
 {
     return container_of(cfs_rq, struct rq, cfs);
 }
 #endif
xiejingf commented 3 years ago

BTW, the current bcc/tools/runqlen.py does not count the RT tasks

yonghong-song commented 3 years ago

BTW, the current bcc/tools/runqlen.py does not count the RT tasks

Do you think it is important to support RT tasks? Typical systems won't have them right?

xiejingf commented 3 years ago

BTW, the current bcc/tools/runqlen.py does not count the RT tasks

Do you think it is important to support RT tasks? Typical systems won't have them right?

Indeed,Most of the time,task belongs to CFS,but if we want to the accurate run queue length, we should count all those existing tasks from different sched classs, right?

xiejingf commented 3 years ago

/proc/sched_debug has all the runnable tasks for each CPU,but that seems not related to this bcc method

yonghong-song commented 3 years ago

Agree. It would be good to count all existing tasks from different sched classes. Any suggestion how to improve tool for that? bcc is using a simpler mechanism as you can see. /proc/sched_debug is more complicated, overall, percpu, and percpu cgroup, etc. bcc is using sampling mechanism trying only to capture important tasks per cpu and it does not dig into cgroup at all. This ensure we have low overhead and focus on important tasks. I agree with you that we may miss some important tasks in say SCHED_FIFO which we might want to find a cheap way to get.

ericjoy1 commented 3 years ago

@xiejingf I think the code you pasted above is the correct way to get nr_running of CPU runqueue. But the problem is the definition of cfs_rq and rq is always changing, so need some checks to get a compatible way.