[scx_lavd] Getting large pauses and stutters in-game

ChrisLane commented 5 months ago

I read about scx_lavd on Phoronix and decided to give it a try and get some basic benchmarks.

I have observed that I am able to quite reliably reproduce stutters and pauses, sometimes several seconds long while running around my ship in Helldivers 2.

Here's the output from sudo scx_lavd while I was running the game with several stutters in the period (not sure on exact timestamps: https://gist.github.com/ChrisLane/1945f66b5d8a2f36a530ca9ac8abcfa1

Please let me know if I can do anything to debug or provide more useful info for your debugging purposes.

System: Distro: Arch Linux Kernel: 6.8.7-1-cachyos GPU: AMD Radeon RX 6800 XT (RADV NAVI21) CPU: AMD Ryzen 9 5900X 12-Core Processor RAM: 36GB

Quackdoc commented 5 months ago

Did you verify the stutters don't happen when the scheduler isn't loaded? I've been having very similar issues with all of the cachyOS kernels https://github.com/CachyOS/linux-cachyos/issues/244 . Im using the cachyos-eevdf kernel for instance and have very similar issues.

ChrisLane commented 5 months ago

I didn't have any issues with the cachyos kernel when I took a baseline, though it was a short test.

With the frequency in which the issue was occurring with scx_lavd (often), I would have expected to notice if it was also happening on default cachyos kernel and scheduler. I can try playing for longer on the default cachyos kernel and scheduler to fully rule that out though.

On 19 Apr 2024, 06:14, at 06:14, Mitchel Stewart @.***> wrote:

Did you verify the stutters don't happen when the scheduler isn't loaded? I've been having very similar issues with all of the cachyOS kernels https://github.com/CachyOS/linux-cachyos/issues/244

-- Reply to this email directly or view it on GitHub: https://github.com/sched-ext/scx/issues/234#issuecomment-2065777224 You are receiving this because you authored the thread.

Message ID: @.***>

Cifer9516 commented 5 months ago

Same thing, heavy load scenarios time out the scheduler and it just unloads, after which program continues running without a problem. Happened in vkd3d game and during compilation.

skygrango commented 5 months ago

I tested compiling the rust program and playing the game at the same time, and got an scx_lavd error.

log file : scx_lavd_dump-2024-04-20-1.txt

edit : update log file.

multics69 commented 5 months ago

Thank you for reporting the issues. This will really helpful to improve and tune the LAVD. The preemption logic is under development and it will be improved next cycle.

@All -- When collecting the log, if you can run sudo scx_lavd -s $(nproc), it would be super helpful.

@ChrisLane -- Thank you for opening the issue and share the log!

@Cifer9516 -- If you don't mind, could you share the log and the game scenario that you played?

@skygrango -- If you don't mind, please share the game scenario that you played? It seems that you ran the old version of scx_lavd. If possible, could you try the lastest version (git HEAD).

bunkbail commented 5 months ago

here's a dump log of sudo scx_lavd -s $(nproc) on latest scx_lavd (commit 9a9b4d). i'm on debian, kernel is compiled by me. what i did was just, i launched scx_lavd, ran the game "infinite wealth", run around for a bit while the game is having freezes all over and tab out until i see lavd crashes. the game runs fine after lavd crashed. dump.txt

bunkbail commented 5 months ago

ran another test (commit 9a9b4d) and attached the full log here, but apologies if it's a bit hard to read since i use oh-my-zsh styling in my terminal.

dump.txt

Galcian79 commented 5 months ago

I think i'm facing the same issue here. Stuttering in game and then LAVD crashes. journalctl.log

multics69 commented 5 months ago

Thanks @bunkbail and @Galcian79 for sharing the logs.

htejun commented 5 months ago

 *R LikeADragon8.ex[3041] -3ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      cpus=00000fff

  R LikeADragon8.ex[2869] -40782ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002
      cpus=00000fff

So, this suggests that 2869 has been sitting on the local DSQ for over 40s which should only be possible if either the previous task keeps running or tasks keep getting queued ahead of it. The latter happens only with explicit ENQ_HEAD or ENQ_PREEMPT, neither of which is currently being used by lavd, so the situation could be that 3041 is running constantly.

Now, a gaming thread being runnable constantly wouldn't be too surprising (maybe it's busy polling for some event?) but what should have happend is it running down its slice and then taking turns with other runnable tasks, which doesn't seem to have happened. Hmm... so, one peculiarity of lavd is that it assigns slice from ops.running(). I can imagine a scenario like the following:

The offending thread becomes runnable.
It starts running, ops.running() sets its slice according to calc_time_slice().
It's running within its slice but it interacts with e.g. graphics driver and wakes up an RT thread.
The RT thread preempts the running task. ops.stopping() is invoked. Because the task hasn't exhausted its slice yet, it gets pushed at the top of the local DSQ so that it can continue its slice once the RT thread is done.
RT thread yields the CPU back. The task which got preempted is picked and starts running again. Go back to (2).

So, if I'm reading the code right, if the task keeps waking up the RT thread before its slice expires, the task can get its time slice refreshed indefinitely. If this is the case, the remedy may be:

a. Distinguish the first ops.running() after ops.runnable() and replenish slice only on the first transition. b. Implement ops.cpu_release() to detect preemption by RT / DL threads and explicitly handle the condition.

I of course could be completely wrong, so please take it with a pinch of salt. From sched_ext core side, I've been thinking about adding the ability for BPF scheduler implementation to add scheduler specific information to the debug dump. Hopefully, that should make debugging from user reports easier.

arighi commented 5 months ago

So, if I'm reading the code right, if the task keeps waking up the RT thread before its slice expires, the task can get its time slice refreshed indefinitely. If this is the case, the remedy may be:

a. Distinguish the first ops.running() after ops.runnable() and replenish slice only on the first transition. b. Implement ops.cpu_release() to detect preemption by RT / DL threads and explicitly handle the condition.

I was wondering if it could be helpful to have a way to easily migrate all tasks to the SCHED_EXT class, even RT/DL tasks (maybe excluding kthreads).

At some point I had a similar issue with rustland and RT audio tasks, where audio was performing better (less audio cracks) by moving the RT tasks to the SCHED_EXT class. I did that from user-space (via schedtool), but having a way directly in sched-ext to "disable" the other sched classes (even if it sounds a bit too extreme) could be useful, especially for debugging, since we could reliably take out of the equation RT tasks. Opinions?

skygrango commented 5 months ago

@skygrango -- If you don't mind, please share the game scenario that you played? It seems that you ran the old version of scx_lavd. If possible, could you try the lastest version (git HEAD).

after I compile the latest git version, scx_lavd no longer crashes.

but when I run compilation with 100% cpu usage in the background, the game fps is still very unstable and low.

htejun commented 5 months ago

I was wondering if it could be helpful to have a way to easily migrate all tasks to the SCHED_EXT class, even RT/DL tasks (maybe excluding kthreads).

Yeah, maybe. It's not technically difficult to implement but it's also kinda nice to have the guarantee that RT tasks will always run no matter what happens to scx schedulers. Lemme think over it.

Thanks.

Galcian79 commented 5 months ago

@skygrango -- If you don't mind, please share the game scenario that you played? It seems that you ran the old version of scx_lavd. If possible, could you try the lastest version (git HEAD).

after I compile the latest git version, scx_lavd no longer crashes.

but when I run compilation with 100% cpu usage in the background, the game fps is still very unstable and low.

I synced to git too. The issue is still there i think, but far less noticeable. Doesn't crash anymore.

Galcian79 commented 5 months ago

Wouldn't be useful to have some tool like scx_lavd --version?

sirlucjan commented 5 months ago

Wouldn't be useful to have some tool like scx_lavd --version?

If you use the “git” version, you can always check what you are using. For example, in Arch Linux:

lucjan at cachyos ~ 22:08:34    
❯ pacman -Qs scx-scheds-git                       
local/scx-scheds-git 0.1.8.r59.g973aded-1
    sched_ext schedulers

you can very easily indicate which commit I built from and then you know what version you have at your place (in this case, for the moment in which I publish a comment - the latest version).

Your idea is not bad, although I see one drawback. Suppose we give version “0.1.0” - and now the question is, how would it be bumped up? What release? Every commit/pull request? I say in advance this is not a criticism just a request to develop the idea.

Galcian79 commented 5 months ago

Wouldn't be useful to have some tool like scx_lavd --version?

If you use the “git” version, you can always check what you are using. For example, in Arch Linux:
lucjan at cachyos ~ 22:08:34    
❯ pacman -Qs scx-scheds-git                       
local/scx-scheds-git 0.1.8.r59.g973aded-1
    sched_ext schedulers
you can very easily indicate which commit I built from and then you know what version you have at your place (in this case, for the moment in which I publish a comment - the latest version).

Your idea is not bad, although I see one drawback. Suppose we give version “0.1.0” - and now the question is, how would it be bumped up? What release? Every commit/pull request? I say in advance this is not a criticism just a request to develop the idea.

I see now. So all the schedulers are tied to the same version.

sirlucjan commented 5 months ago

I mean your idea is not bad. Only you would have to work out how to update the version. However, checking what commit you are based on is also a good solution, because then you know globally what version you are using.

Galcian79 commented 5 months ago

I mean your idea is not bad. Only you would have to work out how to update the version. However, checking what commit you are based on is also a good solution, because then you know globally what version you are using.

I mean usually you would want something like x.x.x-dev. This is usually what binaries compiled from git report. But ofc concerning the -dev part you would always refer to the git commit.

bunkbail commented 5 months ago

Edited my previous comments, adding the commit hash version of the build.

Here's a new dump log of the latest build (commit 973aded), now it took longer to crash lavd (around 5 mins compared to previous version where it took just around 1 min to crash). Game was still lagging and freezing throughout.

dump.txt

multics69 commented 5 months ago

@htejun @arighi After looking at the logs, the watchdog time out happened only when a task is in a local DSQ. I seems your hypothesis is correct -- an RT task keeps preempting an scx task so scx task's time slide is replenished every preemption, running indefinitely and causing the watchdog timeout error.

Here are my two suggestions: 1) I think it is worth to implement your option (a). If possible, it would be better to replelish the remained time slice (not 20 msec). 2) In the watchdog logs, it would be worth to print out task's time slice. This will give us more hint where the problem comes from.

What do you think?

arighi commented 5 months ago

@multics69 I need to look better at the code (maybe @htejun or @Byte-Lab already have a better answer to this), but I think it should be possible already to replenish only the remaining time slice, implementing ops.cpu_release(), check whether a higher-priority task takes the CPU and restoring the remaining time slice in ops.cpu_acquire()...?

Maybe another approach to attack the problem could be to have multiple SCHED_EXT classes (following what Joel suggested here https://lore.kernel.org/all/CAEXW_YR02g=DetfwM98ZoveWEbGbGGfb1KAikcBeC=Pkvqf4OA@mail.gmail.com/). Having multiple SCHED_EXT classes would allow to have more options to decide which classes of tasks a sched-ext scheduler can grab, and eventually provide cmdline options to grab all tasks, or exclude RT tasks, etc.

But I'm just brainstorming here... this is something to think about more.

multics69 commented 5 months ago

@arighi -- Yes, restoring the remaining time slice is possible using cpu_release/acquire() callbacks. However, if it could be the problem in all schedulers, I think it would be better to be implemented on the framework level. Again, @htejun and @Byte-Lab are the right person to answer this. hehe :-)

Supporting multiple sched_ext classes sounds an interesting direction. I will also think about more.

htejun commented 5 months ago

I don't think lavd needs to implement ops.cpu_release/acquire() to solve this problem. The problem is specific to lavd in that currently no other schedulers override p->scx.slice from ops.running() and that lavd is currently not considering the fact that a task may transition multiple times between ops.running() and ops.stopping between a pair of ops.runnable() and ops.quiescent(). Given those, here are possible solutions that come to mind:

Have a per-task flag, e.g. refresh_slice which is set by ops.runnable() and make ops_running() set the slice iff the flag is set and then clear the flag.
I suppose the reason why p->scx.slice is being set from ops.running() is because scx_bpf_dispatch() is too early to determine the slice. This is a bit of hole in the scx API design. Originally, there were only local DSQs so the expectation was that a task which is scx_bpf_dispatch()'d would get executed right away and thus that's where the slice was set. However, with user-DSQs this is no longer the case. Tasks can sit in user DSQs for arbitrary amount of time, so depending on the scheduler implementation, it may not be desirable to set the slice at the time a given task is dispatched to such user DSQ. Hopefully, this should be resolved with the newly introduced DSQ iterators (will update scx repo with support later today). Once this is in place, lavd should be able to walk its user DSQ, pick the target task to run next, set its slice and then consume the task from ops.dispatch() which would make it unnecessary to muck with the slice from ops.running().

Other than the immediate lavd misbehavior, there are three issues raised in this thread:

Taking over all tasks including RT / DL. There's more to think about here. e.g. some RT kthreads are used for kernel's basic operation and stalling them can lead to system level failures. e.g. If kernel hang check timer kthread isn't scheduled in time, it will trigger it which in some systems would be configured to panic and reset the system, so this has the problem of weakening one of the important properties that SCX tries to guarantee. So, right now, I'm leaning more towards leaving them alone. I think, as a group, SCX scheduler writers would need to learn that there are other sched classes on top of us and deal with it. All the necessary interface is already there but we definitely can benefit from better documentation.
The ordering between kernel's fair class and SCX. This seems to be a feature which has been requested by multiple parties and there are no safety concerns. I'll think more on it and likely add this in some form.
Adding more information in debug dumps. Yeah, for sure. However, I'm not sure remaining slice printout would have helped much here. It would have looked like the task just has a normal slice duration as the problem is the slice being repeatedly being replenished (if the current hypothesis is correct). What may have been more useful is the total runtime since the last runnable transition. Also, I'm planning to add scheduler-specific dump information so that schedulers can dump whatever they want to dump.

htejun commented 5 months ago

Hmmm.... @arighi did mention excluding RT kthreads. That probably doesn't break any kernel mechanisms. I'll think more on it.

multics69 commented 5 months ago

Thank you @htejun for the detailed explanation.

Setting the time slice at ops.running() seems to be the source of the problem as you explained.

Regarding the scx iterator, I don't think the iterator helps much here. That is because the global DSQ is already vtime-based DSQ. And, in lavd, vtime of DSQ is task' deadline, so scx_bpf_dispatch() should always return a task with the earliest deadline. Or we set task's time slice at ops.dispatch() using scx iterator?

Regarding the refresh_slice, I thought a transition such that ops.runnable() -> ops.running() -> ops.stopping() -> ops.running() ... is possible even without RT task preemption. I thought it could happen when a task's time slice is exhausted but a task is still in a runnable state. Am I correct?

htejun commented 5 months ago

Regarding the scx iterator, I don't think the iterator helps much here. That is because the global DSQ is already vtime-based DSQ. And, in lavd, vtime of DSQ is task' deadline, so scx_bpf_dispatch() should always return a task with the earliest deadline. Or we set task's time slice at ops.dispatch() using scx iterator?

Yes, that's what I meant. ops.dispatch() can set the slice using the iterator as it will have access to what will get consumed.

Regarding the refresh_slice, I thought a transition such that ops.runnable() -> ops.running() -> ops.stopping() -> ops.running() ... is possible even without RT task preemption. I thought it could happen when a task's time slice is exhausted but a task is still in a runnable state. Am I correct?

You're right. I guess that's why you put slice setting in ops.running() in the first place. Would reasserting refresh_slice from ops.stopping() if it has exhausted its slice work?

htejun commented 5 months ago

Or rather, how about ops.runnable() setting slice to 0 and ops.running() refreshing slice iff it's 0?

htejun commented 5 months ago

Oops, that won't work right now. Kernel will trigger zero slice warning.

bunkbail commented 5 months ago

Another update from me. I've built lavd from the latest commit (d9ea53c) and also cherry picked 2 patches from PR #247 and PR #250. Now, lavd no longer crashes and the game no longer stutters/freezes like before. In my limited test, everything is smooth as butter. Awesome work @multics69 and @htejun!

multics69 commented 5 months ago

Thank you @bunkbail for testing the PRs!

DasLeo commented 5 months ago

@bunkbail on which kernel/distro did you test it? I just want to verify if I'm having the same issue on my environment and exclude other kernel config/gamemode related issues.

multics69 commented 5 months ago

@DasLeo Did you manually apply PR #247 and #250? They are not merged yet.

bunkbail commented 5 months ago

@bunkbail on which kernel/distro did you test it? I just want to verify if I'm having the same issue on my environment and exclude other kernel config/gamemode related issues.

I build my own kernel but it's basically linux-cachyos on AUR, replicated and adapted for devuan ceres (debian sid + openrc) since that's what I use.

Galcian79 commented 5 months ago

Runned a preliminary test and it seems fixed.

DasLeo commented 5 months ago

Sorry for being late to the party, just after the patches have been merged. 🙂 I wanted to first figure out that everything is working on my current kernel before I start to test sched_ext again.

Tested 0.1.9 on Apex Legends, where I saw the most stutters and it's performing way better now, however I still get constant micro-stutters, that I don't see without attaching an external scheduler and using plain BORE+EEVDF in Cachy Kernel.

multics69 commented 5 months ago

Thank you @DasLeo for testing scx_lavd! I am glad to hear that the patches somewhat mitigate the problem.

For Apex Legends, could you please run sudo scx_lavd -s $(nproc) and share the log with your system info (CPU model, etc)?

I am working on improving the preemption in scx_lavd. Your logs will help a lot in improving the preemption logic.

Galcian79 commented 5 months ago

Apex Legends is FTP i could give it a try too.

DasLeo commented 5 months ago

Apex Legends is FTP i could give it a try too.

If you test it, make sure to enable the shader messages DXVK_HUD as they also produce lots of stutters, when not yet finished. Also unlock FPS with %command% +fps_max unlimited argument.

It's already enough to work in the Firing Range, no need to play online multiplayer.

Galcian79 commented 5 months ago

Apex Legends is FTP i could give it a try too.

If you test it, make sure to enable the shader messages DXVK_HUD as they also produce lots of stutters, when not yet finished.

Even with GPL?

DasLeo commented 5 months ago

Apex Legends is FTP i could give it a try too.

If you test it, make sure to enable the shader messages DXVK_HUD as they also produce lots of stutters, when not yet finished.

Even with GPL?

@Galcian79 if you mean this, it's actually as stated in the docs. Shader compilation starts as soon as the mode (map) is selected in the menu, but it takes a while and keeps running during the game phase. Apex Shaders can go up to 2GB at least in my $HOME/.cache/nvidia/GLCache/. Also had to use __GL_SHADER_DISK_CACHE_SKIP_CLEANUP=1 to prevent NVIDIA drivers from cleaning them up after 256MB or something.

If the caching is done, the game runs butter smooth but with a slight difference when enabling scx_lavd.

LethalManBoob commented 5 months ago

I will add here that CS2 main menu is like 10fps with lavd, runs fine without

Galcian79 commented 5 months ago

I started the game with fps unlocked, all graphic settings set to max. I played on the training ground waiting for dxvk to compile the shaders. Didn't notice any stutter. Then i started lavd and took the logs while playing a 10 mins match. I didn't notice any stutter either. (BTW, we won :1st_place_medal: ) inxi.log

1172470_stamps.log

LethalManBoob commented 5 months ago

I started the game with fps unlocked, all graphic settings set to max. I played on the training ground waiting for dxvk to compile the shaders. Didn't notice any stutter. Then i started lavd and took the logs while playing a 10 mins match. I didn't notice any stutter either. (BTW, we won 🥇 ) inxi.log 1172470_stamps.log

I dont know what your specs are but i run nvidia arch

Galcian79 commented 5 months ago

I started the game with fps unlocked, all graphic settings set to max. I played on the training ground waiting for dxvk to compile the shaders. Didn't notice any stutter. Then i started lavd and took the logs while playing a 10 mins match. I didn't notice any stutter either. (BTW, we won 🥇 ) inxi.log 1172470_stamps.log

I dont know what your specs are but i run nvidia arch

That's why i included my inxi.

LethalManBoob commented 5 months ago

I started the game with fps unlocked, all graphic settings set to max. I played on the training ground waiting for dxvk to compile the shaders. Didn't notice any stutter. Then i started lavd and took the logs while playing a 10 mins match. I didn't notice any stutter either. (BTW, we won 🥇 ) inxi.log 1172470_stamps.log

I dont know what your specs are but i run nvidia arch

That's why i included my inxi.

it doesnt show your specs...

Galcian79 commented 5 months ago

I started the game with fps unlocked, all graphic settings set to max. I played on the training ground waiting for dxvk to compile the shaders. Didn't notice any stutter. Then i started lavd and took the logs while playing a 10 mins match. I didn't notice any stutter either. (BTW, we won 🥇 ) inxi.log 1172470_stamps.log

I dont know what your specs are but i run nvidia arch

That's why i included my inxi.

it doesnt show your specs...

Which specs are we talking about?

LethalManBoob commented 5 months ago

didnt know these two links were seperate

Galcian79 commented 5 months ago

didnt know these two links were seperate

Yeah my fault, i was editing on my phone and didn't realized they were on the same line.

Galcian79 commented 5 months ago

Fixed my op.

sched-ext / scx

[scx_lavd] Getting large pauses and stutters in-game #234