sched-ext / scx

sched_ext schedulers and tools
https://bit.ly/scx_slack
GNU General Public License v2.0
690 stars 48 forks source link

`scx_lavd` seems to choke under either building this package with -j16, or building an LTO package that uses all 16 cores #223

Closed kode54 closed 1 month ago

kode54 commented 2 months ago

As is the Arch default now, everything builds with LTO. Building the Wayfire compositor causes heavy LTO build activity to fill all MAKEOPTS jobs (-j16, matching core count), and makes various things bog down, including Discord video calls, and if I'm using Wayfire, the compositor itself bogs down even under less stressful 16 core builds.

multics69 commented 2 months ago

Thank you for reporting the problem. I am working on a better preemption scheme under scx_lavd. Hopefully, a new PR (#224 ) that was just sent partly addresses your problem. More optimization will come. Stay tuned.

kode54 commented 2 months ago

Something wonky is going on with Github's commit history for that PR, I can't apply the five patches in order either with patch -Np1 or with git apply or git am. I have to check out the branch directly, git diff it against main, then apply the patch to main in my build process. Will report on how it works out.

multics69 commented 2 months ago

Thanks a lot!

kode54 commented 2 months ago

Okay, I tried the branch. Building scx-scheds-git with the patch applied again while running the already built scx_lavd with default settings. If I don't run the whole makepkg -f process with nice -n 19, but instead let it run at the default priority, the system becomes unusable for the duration of the build. If I nice -n 19 makepkg -f, then it runs comfortably for the duration.

Also, attempting to use schedtool -D instead has no effect on the execution priority, and also results in an unusable system.

multics69 commented 2 months ago

If you don't mind, could you run scx_lavd with the -s $(nproc) option and share the log while your system is unusable for just 4-5 minutes? The log will contain the statistics of how scx_lavd works.

kode54 commented 2 months ago

Here's a log of it running for the duration of a build. I also had to do it a second time, because I didn't use &| for the pipe, and lost the log.

lavd.log.gz

multics69 commented 2 months ago

Thanks a lot!

multics69 commented 1 month ago

@kode54 Could you please test if the problem still exists? PR #274 should fix the problem.

kode54 commented 1 month ago

I'll have to build a 6.9-rc7 kernel to test it, since 6.8.9 is unsupported.

kode54 commented 1 month ago

It stops some of the stuttering, but it doesn't stop lag building up behind OBS and my camera input. The lag even shows in OBS itself, not just the client pulling from the virtual camera.

multics69 commented 1 month ago

@kode54 Thanks for building the kernel and testing lavd. BTW, what is OBS? I will try to re-produce it on my end.

multics69 commented 1 month ago

@kode54 If you don't mind, could you please share the LAVD logs?

kode54 commented 1 month ago

Here's a log that stopped when I stopped the service shortly after the lag. OBS is OBS Studio, or Open Broadcast Software. https://obs-studio.org/

scx_lavd.log.txt

Edit: FYI, I'm using a Logitech C620 camera, with Motion JPEG format at 1080p30 for the capture.

multics69 commented 1 month ago

@kode54 -- Hmm... I tried to reproduce the problem but it seems okay from my end.

The environment that I tested as follows:

I ran the following workloads with scx_lavd:

Any hints so I can reproduce the problem? Is there any possibility that OBS is buggy?

kode54 commented 1 month ago

Whatever. I won't be using sched-ext any more anyway, unless Arch somehow adds it to their stock kernel and enables a scheduler by default out from under me.