Open arighi opened 1 month ago
I think I found a much easier reproducer, see 7f9b009c9c772e04c9da614fc6056dc9a6c47f0d.
It seems that in 6.12, ops.update_idle()
is occasionally not being called. scx_rustland_core depends on ops.update_idle()
to trigger the wakeup of the user-space scheduler to handle pending tasks, so skipping it leads to poor performance. This issue is likely related to changes of pick_next_task()
/ put_prev_task()
in the kernel.
I don't have a fix yet, I'm just sharing the reproducer for now, I'll investigate more on the kernel side.
FYI, https://lore.kernel.org/lkml/20241013173928.20738-1-andrea.righi@linux.dev/T/#u seems to fix this regression.
This commit in the kernel introduces a pretty bad performance regression in all the scx_rustland_core schedulers:
7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
System becomes completely unresponsive when it's saturated and it's very easy to reproduce (i.e., starting a parallel kernel build with scx_rustland active).
I think the reason is one (or both) of these behavior changes:
But I haven't figured out exactly why, I've been playing a bit with
SCX_ENQ_LAST
, unsuccessfully, so I'm just opening the issue for now. Any pointers on how to attack this?