Open htejun opened 3 weeks ago
hi @htejun, I'm interested in working on this. I'm trying to get familiar with the repo by helping with good "first" issues.
I'm assuming there is no existing API that provides the scheduling delay metric for a task, so it needs to be implemented at per-scheduler basis? It'll be the accumulated delay between enqueued
and running
, if I understand correctly.
Is there an example implementation in other schedulers that I can refer to? If not, are we interested in adding such metric for other schedulers as well (or we will just rely on kernel's overall scheduling delay metric)?
The information is useful for all schedulers but overall system metrics are easily observable with e.g. btftrace
or bcc
tools. How the metrics should be aggregated would depend on the specifc scheduler - e.g. scx_layered
needs to collect the metrics per layer. scx_bpfland
would probably want to aggregate depending on whether the task is classified interactive or not and so on. One altnerative approach could be coming up with a shared way of "tagging" tasks so that generic BPF tool can aggregate the numbers according to the tags.
We can start with baked-in impelmentation is each scheduler. I'd measure the durations whenever the task is runnable but not running - ie. ops.runnable()
to ops.running()
transition durations and ops.stopping()
to the subsequent ops.running()
transitions.
As scheduler behavior metrics can vary widely across different layers, system level metrics aren't that useful in understanding how the scheduler is behaving. Add more per-layer metrics including per-layer scheduling delay metric.