hyprwm / Hyprland

Hyprland is an independent, highly customizable, dynamic tiling Wayland compositor that doesn't sacrifice on its looks.
https://hyprland.org
BSD 3-Clause "New" or "Revised" License
19.83k stars 842 forks source link

Triple buffering for low end hardware #3373

Open ErrorNoInternet opened 1 year ago

ErrorNoInternet commented 1 year ago

Description

Both KDE and GNOME are working on this in the hopes of improving performance on low end hardware (not just low end hardware sometimes). I think lots of people would benefit from this as there are already lots of people using the mostly-finished GNOME patch (on the AUR for example). I am also not aware of any Wayland compositor that fully supports triple buffering at the moment (or at least how to enable it).

vaxerski commented 1 year ago

I fail to see the point of triple buffering. Can anyone explain why the f?

ErrorNoInternet commented 1 year ago

Smoother eye candy 😎😎👍

Lower end GPUs (Intel HD?) can't keep up when you're doing "intensive stuff" with double buffering, e.g. OBS with a transparent and blurred Electron app, or even a Zoom meeting. One partial fix is to lock the GPU at higher frequencies 24/7 so that it doesn't "over-idle" and drop frames, but that doesn't fully fix the performance issue.

It has never happened to me on Xorg, but as soon as I tried Wayland (Hyprland, Sway, KDE Wayland), I see dropped frames.

vaxerski commented 1 year ago

I get the effect but not the why?

ErrorNoInternet commented 1 year ago

So that people with low end iGPUs (even the i7-4790k falls into this category) can use Hyprland with the same performance as Xorg :)

vaxerski commented 1 year ago

I don't think I am being clear enough here:

how does triple buffering achieve such an effect? I don't see any reason as to why that should smoothen anything.

ErrorNoInternet commented 1 year ago

For example:

Double buffering

Front buffer is being displayed Back buffer is being rendered

If the back buffer takes longer than usual to render, you'll notice it (as a stutter) since it'll show the front buffer for a longer period of time until the back buffer is finally ready and swapped.

Triple buffering

Front buffer is being displayed Middle buffer is on standby Back buffer is being rendered

If the back buffer takes longer than usual, you'll already have a frame in the middle buffer, so you could just swap that first. (middle buffer is now empty). By the time its swapped, you would have already finished drawing, so you'd add that to the middle buffer and start working on the back buffer again. After a few frames you'd catch up (unless you're having VERY severe performance issues).

This increases latency, but allows the GPU to work ahead and gives you a "backup frame" that prevents you from seeing stutters when the GPU suddenly slows down during a frame (e.g. un-idling)

(forgive me if my logic is wrong, it is currently 3 am 💀)

vaxerski commented 1 year ago

uh-huh. Interesting. I am unsure whether this would require a big hack to work with wlroots. I'll see.

h0tc0d3 commented 1 year ago

I doubt triple buffering has any practical value. These are all technologies that were before vsync and adaptive sync. Triple buffering has an input lag that will ruin the gaming experience. Hyprland doesn't use the gpu enough to cause problems. The Gnome and KDE that specified is a heavier projects that puts a higher load on the cpu and gpu. i7-4790k should handle hyprland fine.

I don't understand, how you could compare the performance of Xorg and Hyprland Wayland, which does not run under Xorg. Can you provide a flamegraph where you can clearly show the “best performance” of Xorg?

Wayland is faster than Xorg, the only difference is that wayland prevents frame tearing, and includes vsync and adaptive sync. You can reduce latency by disabling vrr.

I also don’t see the point in making changes to support older hardware. Often this requires creating dirty hacks in the code, which complicates future support, prevent the compilers from optimizing code more efficiently and can spoil compatibility with newer hardware. For this reason, all adequate projects, including the Linux kernel, get rid of old legacy code.

ErrorNoInternet commented 1 year ago

Triple buffering has an input lag that will ruin the gaming experience.

Not everyone plays games, and triple buffering would be optional, not forced on everyone. Some people (me included) can live with input lag.

i7-4790k should handle hyprland fine.

The i7-4790k uses the Intel HD 4600 iGPU, which I also have (my CPU is the i7-4710HQ), but I see performance issues. So I guess you would see them on the i7-4790k too? The iGPU should be split from the CPU, so CPU computational power shouldn't matter.

Wayland is faster than Xorg, the only difference is that wayland prevents frame tearing, and includes vsync and adaptive sync. You can reduce latency by disabling vrr.

*depending on the Wayland compositor I have tried numerous Wayland compositors, and I have the same performance issues in all of them, but when I go on KDE X11 or GNOME X11, it runs well.

I also don’t see the point in making changes to support older hardware. Often this requires creating dirty hacks in the code, which complicates future support, prevent the compilers from optimizing code more efficiently and can spoil compatibility with newer hardware. For this reason, all adequate projects, including the Linux kernel, get rid of old legacy code.

In my opinion, triple buffering isn't really a hack for legacy hardware. It's literally just making use of a third buffer so that your GPU wouldn't stutter like hell. Running Hyprland on a Raspberry Pi 4 shows the same issues, and the Raspberry Pi 4 came out in 2019. Low end hardware isn't always equal to legacy hardware. The Linux kernel doesn't just throw away Raspberry Pi code because it can't compress data at 200 MB/s or something.

If I have the same level of graphical effects in Hyprland and on KDE/GNOME, and KDE/GNOME doesn't stutter, shouldn't Hyprland also not stutter? If it's not related to triple buffering, what is it then? I'm genuinely curious.

vaxerski commented 1 year ago

I kinda don't understand how triple buffering is supposed to fix anything.

On vblank, we swap buffers and begin rendering a new frame.

If the frame takes > (1000 / refresh rate)ms, we will drop a frame, causing stutter.

E.g. 60Hz screen, vblank every 16.6ms, if frames take 23ms to render, we get dropped frames.

If a frame takes 23ms to render, how does rendering another frame fix the issue?

There is no such thing as "longer than usual" here, lag often happens when we have large animations which require a lot of pixels to be pushed through the pipeline.

h0tc0d3 commented 1 year ago

I have tried numerous Wayland compositors, and I have the same performance issues in all of them, but when I go on KDE X11 or GNOME X11, it runs well.

This is called the placebo effect. Such statements without verified performance tests are nothing more than self-hypnosis. Therefore, be kind enough to provide a flamegraph and gpu profiling results in this case. Wayland's overhead is significantly lower than X11. The problem can only be with frame rate lock i.e. vsync locks the frame rate to prevent tearing. Wayland uses gles 2 for rendering, most likely there is a problem with your Intel - Mesa drivers, and some extensions are not working correctly.

vaxerski commented 1 year ago

Wayland uses gles 2 for rendering, most likely there is a problem with your Intel - Mesa drivers, and some extensions are not working correctly.

correction: wayland is a protocol. Wayland doesn't render anything.

Hyprland uses gles3.2 for rendering (gles2 if legacy_renderer is set)

when it comes to performance issues, all I can think is that xorg would somehow make your gpu clock higher, thus increasing its performance at the cost obviously of power-efficiency.

ErrorNoInternet commented 1 year ago

I kinda don't understand how triple buffering is supposed to fix anything.

On vblank, we swap buffers and begin rendering a new frame.

If the frame takes > (1000 / refresh rate)ms, we will drop a frame, causing stutter.

E.g. 60Hz screen, vblank every 16.6ms, if frames take 23ms to render, we get dropped frames.

If a frame takes 23ms to render, how does rendering another frame fix the issue?

There is no such thing as "longer than usual" here, lag often happens when we have large animations which require a lot of pixels to be pushed through the pipeline.

That's only if we have 2 buffers. If we have 3 buffers, and the second frame doesn't take more than 23 ms, we're good. If the third frame takes longer, we show the second frame first. I am not a graphics expert at all but I think this is how it works.

ErrorNoInternet commented 1 year ago

Hyprland uses gles3.2 for rendering (gles2 if legacy_renderer is set)

That could be why, but needs further testing.

when it comes to performance issues, all I can think is that xorg would somehow make your gpu clock higher, thus increasing its performance at the cost obviously of power-efficiency.

I have already locked my iGPU to its maximum frequency by using sudo intel_gpu_frequency -m though, and I still see noticeable stutters.

ErrorNoInternet commented 1 year ago

I have tried numerous Wayland compositors, and I have the same performance issues in all of them, but when I go on KDE X11 or GNOME X11, it runs well.

This is called the placebo effect. Such statements without verified performance tests are nothing more than self-hypnosis. Therefore, be kind enough to provide a flamegraph and gpu profiling results in this case. Wayland's overhead is significantly lower than X11. The problem can only be with frame rate lock i.e. vsync locks the frame rate to prevent tearing. Wayland uses gles 2 for rendering, most likely there is a problem with your Intel - Mesa drivers, and some extensions are not working correctly.

https://bugs.kde.org/show_bug.cgi?id=452119

h0tc0d3 commented 1 year ago

https://bugs.kde.org/show_bug.cgi?id=452119

I don't see flamegraph and gpu profiler results there. I only see reports from glxgears, which, as I remember, works via X11, and in wayland uses xwayland. Therefore xwayland must be the latest version and support the protocol extension tearing_control_v1, if you want to get the best frame rate and lower latency.

It is not correct to indicate third-party projects. In the same way, you can blame the phase of the moon on the exchange rate of world currencies. These are different projects, with different code and dependencies.

Once again I ask you to provide flamegrapth and gpu profiler results. Because it looks more like an intel - mesa problem. Also your link states that this is a driver problem.

ErrorNoInternet commented 1 year ago

It is not correct to indicate third-party projects. In the same way, you can blame the phase of the moon on the exchange rate of world currencies. These are different projects, with different code and dependencies.

Nah, I was trying to show you that it wasn't the placebo effect, it is very noticeable.

Once again I ask you to provide flamegrapth and gpu profiler results. Because it looks more like an intel - mesa problem.

Alright then, I'll try (have never done anything relating to profiling/performance measurements)

Also your link states that this is a driver problem.

Where exactly though?

h0tc0d3 commented 1 year ago

@ErrorNoInternet try disable animations and blur in hyprland settings. These seem to be the most resource-intensive operations.

vaxerski commented 1 year ago

That's only if we have 2 buffers. If we have 3 buffers, and the second frame doesn't take more than 23 ms, we're good. If the third frame takes longer, we show the second frame first. I am not a graphics expert at all but I think this is how it works.

but if we have 3 then the third will not be drawn either if the second takes >16ms...?

Idk seems dodgy at best. Anyways, wlroots forces double-buffering from what I can tell.

CactiChameleon9 commented 1 year ago

That's only if we have 2 buffers. If we have 3 buffers, and the second frame doesn't take more than 23 ms, we're good. If the third frame takes longer, we show the second frame first. I am not a graphics expert at all but I think this is how it works.

but if we have 3 then the third will not be drawn either if the second takes >16ms...?

Not an expert (in fact this is only a non-technical guess)... is triple buffering useful in the situation where the GPU clock speeds take too long to increase (was reading in here that the CPU and the the GPU need to both individually do work first) and so triple buffering is to improved the parallelisation of that process somehow that workaround the low-clockspeeds?

ErrorNoInternet commented 1 year ago

Not an expert (in fact this is only a non-technical guess)... is triple buffering useful in the situation where the GPU clock speeds take too long to increase (was reading in here that the CPU and the the GPU need to both individually do work first) and so triple buffering is to improved the parallelisation of that process somehow that workaround the low-clockspeeds?

Yeah, I think that's what its supposed to solve.

That blur + screen recording (with OBS) thing might actually be because my iGPU really can't keep up*, since I've already locked my iGPU at its highest frequencies.

The reason why I still saw stutters sometimes even when I'm not doing anything intensive was because my iGPU goes down to 400 MHz when idling (not doing anything for a few seconds, just like https://github.com/hyprwm/Hyprland/issues/2484). Turning off VFR really locks it at its highest frequencies, and those stutters seems to have disappeared (I still get stutters when blur + screen recording though, and intel_gpu_top hovers at around 90%-100%, so yes, GPU can't keep up*).

And also, I did some more testing on my friend's i7 10th-gen (laptop) CPU (I nixos-generated an image and flashed it to my USB so he could boot from it), and Hyprland is WAYYY more intensive than KDE (judging from intel_gpu_top). KDE Wayland hovers at around 20-30% usage when I'm switching focus + moving windows around (with the wobbly windows effect), and switching focus between different windows alone in Hyprland (with default settings) bumps it all the way to 60-80%. But of course this iGPU is stronger so it doesn't stutter at all.

However, I get terrible battery life if I turn off VFR and lock the iGPU at its highest frequencies. I had btop open in kitty, and with VFR off + highest GPU frequencies, I get ~1.5 hours of battery life. Going to default clock speeds and VFR on, I get ~4 hours of battery life.

And from my "analysis" before:

gives you a "backup frame" that prevents you from seeing stutters when the GPU suddenly slows down during a frame (e.g. un-idling)

I guess that makes sense, it gives the GPU some extra time (since there are 2 frames waiting to be rendered) while its bumping clock speeds ("un-idling").

*Edit: screen recording stuttering might be related to https://github.com/hyprwm/Hyprland/issues/2515??? I haven't tested the patch though.

vaxerski commented 1 year ago

well in that case the question is whether we want to force higher gpu clocks via calculating a useless frame, or do we prefer to find another way to tell the gpu to maybe not chill for a sec

ErrorNoInternet commented 1 year ago

well in that case the question is whether we want to force higher gpu clocks via calculating a useless frame, or do we prefer to find another way to tell the gpu to maybe not chill for a sec

i mean you kinda have to predict the future (user interaction/new draw call) to tell the GPU to get ready earlier, or you could add latency so that you could give the GPU more time to prepare (which would equate to triple buffering)

Edit: and the gpu likes to chill 24/7, not just after a few seconds of inactivity which is why there are stutters everywhere and not just after inactivity (at least in my case) Edit of edit: dma fence deadline awareness so it doesn't chill 24/7?

Edit 2: and if we force the gpu to only idle after X seconds of inactivity, then there wouldn't be any battery savings at all if something on your display is refreshing every X-1 seconds (like btop refreshing every second)

vaxerski commented 1 year ago

triple buffering to me seems out of the question, wlroots doesn't allow triple buffering.

We could render a redundant frame or something, but it would still not be true triple buffering.

i mean you kinda have to predict the future

if (animation) dont_sleep_pls() would already be better than nothing

ErrorNoInternet commented 1 year ago

triple buffering to me seems out of the question, wlroots doesn't allow triple buffering.

maybe I should make an issue on the wlroots gitlab then? 👀

seems like #2484, #2305, #2501 are all possibly related

if (animation) dont_sleep_pls() would already be better than nothing

true, ig we could turn off VFR temporarily for dont_sleep_pls()? wouldn't solve stutters at the start of the animation or when you move your mouse, but when the animation slows down (possibly near the end) then it would probably help

h0tc0d3 commented 1 year ago

maybe I should make an issue on the wlroots gitlab then? 👀

No. The projects are practically not connected in any way, Hyprland is not sway. As far as I know, wlroots is used in Hyprland only for xwayland.

https://github.com/hyprwm/Hyprland/blob/main/meson.build

wlroots = subproject('wlroots', default_options: ['examples=false', 'renderers=gles2'])
have_xwlr = wlroots.get_variable('features').get('xwayland')
xcb_dep = dependency('xcb', required: get_option('xwayland'))
vaxerski commented 1 year ago

No. The projects are practically not connected in any way, Hyprland is not sway. As far as I know, wlroots is used in Hyprland only for xwayland.

Please refrain from commenting if you don't know what you are talking about.

maybe I should make an issue on the wlroots gitlab then? 👀

Most likely. Check if there isn't one already, though.

Maybe there is a way to already make it with wlroots, but it eludes me how (but I'd say unlikely as for example the damage_ring is for double buffering)

ErrorNoInternet commented 11 months ago

Simon replied with:

In theory nothing in wlroots prevents a compositor from performing triple buffering. Maybe need to bump the swapchain capacity, but apart from this, can't think of anything else.

romanstingler commented 8 months ago

wlroots has code for damage tracking accounted for triple buffering https://github.com/swaywm/wlroots/blob/0855cdacb2eeeff35849e2e9c4db0aa996d78d10/include/wlr/types/wlr_output_damage.h#L16C1-L21C41

I think a good read for average technical guys about that topic is https://discourse.ubuntu.com/t/why-ubuntu-22-04-is-so-fast-and-how-to-make-it-faster/30369

ErrorNoInternet commented 3 months ago

Blog about the new triple buffering implementation in Plasma 6.1: https://zamundaaa.github.io/wayland/2024/06/25/fixing-kwin-perf-on-old-hardware.html

Has a few interesting details about latency.

I have tested 6.1 as well and there seems to be zero dropped frames. Even while frantically moving my mouse over Steam tooltips, which makes Hyprland almost grind to a halt.

I also did a few tests with dma fence deadline awareness when 6.0 came out, which seemed to make the Kwin Wayland session a tiny bit smoother (no more stuttery mouse after a few seconds of inactivity, no more laggy window open animations), but it still stuttered when doing something intensive like screen recording 1080p video playback. With triple buffering the stutters are now completely gone.

vaxerski commented 3 months ago

again, triple buffering is a lazy hack that avoids the problem and instead wastes performance.

I have a different approach planned for once the aquamarine migration is done.