sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
799 stars 72 forks source link

scx_qmap NOHZ tick-stop error and unrelated question #237

Open somewhatfrog opened 4 months ago

somewhatfrog commented 4 months ago

System specs: up to date arch with cachyos repos and cachyos kernel cpu 5800X3D gpu 3060ti nvidia prop ram 64gb ecc

dmesg from time to time shows this: NOHZ tick-stop error: local softirq work is pending, handler #40!!!, though I didn't notice any negative impacts from that.

Unrelated, but I have a question, this scheduler's note says:

This scheduler is primarily for demonstration and testing of sched_ext features and unlikely to be useful for actual workloads.

But on 5800X3D this scheduler gives me the best 0.1%min, 1%min fps compared to EEVDF or CFS (which otherwise are second best results) while avg is more or less at the same level in proton and native games (tested with Elden Ring and Project Zomboid using mangohud benchmark over 5 min in a controlled environment). Meanwhile LAVD causes stutters and almost half of the avg framerate, which is I guess expected because it is not a single CCX cpu.

So why scx_qmap is considered "unlikely to be useful for actual workloads"? From my experience it is the best scheduler I happened to use with my CPU so far and I daily drive it for the past week.

htejun commented 4 months ago

I think I saw the nohz message several times too. Will look into it later.

As for scx_qmap, the design and implementation are primarily focused on testing and demonstration of various features of sched_ext. It implements coarse multi-queue FIFO scheduling in a rather inefficient way. Now, for some workloads, multi-queue FIFO works pretty well, so there can be workloads that scx_qmap can handle okay. However, it'd be easy to push it over the edge - launching some thrasing threads in the same queue level can easily degrade interactivity severely and each queue has limited depth and it'd be relatively easy to overflow them which would make the scheduler behave as global FIFO. Also, it doesn't have any toplogy awareness so cpu bandwidth sensitive workloads would likely suffer on CPUs with more complex topolgy and so on and so forth.

It could well be that there are games which really like multi-queue FIFO scheduling. If so, what we would want to do is either understanding why that is and incorporating that into more practical schedulers (e.g. rusty and/or lavd) or implementing a dedicated scheduler. For now, I think it may be the most productive to concentrate on lavd. It's still a very early implementation and there will be some growing pains but it has a lot of potential and a dedicated developer who deeply cares about gaming performance.