sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
829 stars 77 forks source link

sched/c/scx_nest: task priority is not respected properly #419

Open honxia02 opened 2 months ago

honxia02 commented 2 months ago

Code in scx_next obviously tries to take p->scx.weight into account and priorities (or nice values) should work.

However, scheduling two tasks on the same rq with +5 and -5 nice values results in roughly 50%-50% run time, while such values should give you roughly 10% and 90% in CFS.

Some initial investigations lead me to this code block in nest_enqueue()

        /*
         * Limit the amount of budget that an idling task can accumulate
         * to one slice.
         */ 
        if (vtime_before(vtime, vtime_now - slice_ns))
                vtime = vtime_now - slice_ns;

The line right below it

        scx_bpf_dispatch_vtime(p, FALLBACK_DSQ_ID, slice_ns, vtime,
                               enq_flags);

will then bring the nice(-5) task's vtime to roughly the same as that of the nice(5) task, therefore ruining the previous slow accumulation of vtime and ruining how priorities work.

Removing the vtime fix here and just use the normal version scx_bpf_dispatch() without passing vtime gives the correct task run time distribution of 10% and 90%.

honxia02 commented 2 months ago

After more experiments, the bug exists but my comments around scx_bpf_dispatch(vtime) were wrong, so please disregard them. But I left the issue open because the bug still exists.