sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
958 stars 91 forks source link

scx_rusty: Make load balancing decision executed reliably #611

Open htejun opened 2 months ago

htejun commented 2 months ago

Right now, after the userland loadbalancer makes migration decisions (move task X to domain N), the decision is recorded in the lb_data map which is a map from pid_t to destination domain number. Then, on the enqueue path, rusty_enqueue() checks whether the task has a matching entry in lb_data and if so executes the requested migration. This means that the application of LB decisions isn't reliable - it depends on the task being migrated running in the following period. Otherwise, the decision is ignored.

While this works okay in practice as the LB just keep retrying until the domains are balanced, this makes the behavior less predictable. It'd be great to make the load balancing decisions executed reliably. Maybe test_run can be used to execute migrations immediately - see set_power_profile() in scx_lavd for an example.

vax-r commented 2 months ago

Is it still available ? If so I would love to help

vax-r commented 3 weeks ago

The draft PR is ready but I've encountered a problem which might need some help. The machine I'm running on now is using AMD Ryzen 7 5700X3D 8-Core Processor and under rusty, it has only 1 NUMA node and 1 domain, so the load balancing step would be stopped before it can actually do anything. That way I can't test whether my change has made some improvement or not, maybe someone would be so kind to test the PR for me ?

If that would do I'll send a draft PR first and see what's the testing result, otherwise I'll try to think of other ways to test it.

htejun commented 3 weeks ago

You can use -C option to define arbitrary LB domains, which should be sufficient for testing.

vax-r commented 3 weeks ago

You can use -C option to define arbitrary LB domains, which should be sufficient for testing.

Thank you ! I can test it now and found some problems, I'll figure it out and send a PR later.