wlr_output's design doesn't work well with repaint scheduling

swaywm / wlroots

A modular Wayland compositor library

https://gitlab.freedesktop.org/wlroots/wlroots/

MIT License

2.15k stars 343 forks source link

wlr_output's design doesn't work well with repaint scheduling #1925

Open emersion opened 4 years ago

emersion commented 4 years ago

wlr_output has a baked-in assumption that the compositor will render right away when the frame event is triggered.

If the compositor doesn't do this, then wlr_output assumes the compositor stopped its rendering loop. Some wlroots utilities (wlr_output_damage, wlr_screencopy_v1) need to resume the rendering loop, and will call wlr_output_schedule_frame to do so. This function will re-submit the previous frame (perform a page-flip on DRM) to request a new frame event. We don't resume the rendering loop immediately to make sure we're correctly aligned with vsync.

All of this will prevent the compositor from submitting a new frame some time after the frame event.

To implement repaint scheduling in Sway, wlr_output.block_idle_frame has been introduced. It's basically a way for the compositor to say "i'll submit a frame soon, so please ignore any wlr_output_schedule_frame call".

I'm not a fan of the current situation because the wlr_output interface has become too complicated and opinionated. Removing wlr_output.block_idle_frame is not trivial, see https://github.com/swaywm/sway/pull/4588#discussion_r347147171.

I'll try to think of a better solution, with repaint scheduling + variable refresh rate in mind.

wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/1925

ascent12 commented 4 years ago

I wonder if it's worth going with the wl_surface model for this. We've already gone towards that with most things in wlr_output, but it could be frame events too. Instead of the backend dictating the user's rendering loop, the user would have a more active role in driving it. I honestly think we should drop all of this needs_frame, frame_pending, idle_frame crap.

Although one foreseeable issue is restarting the rendering loop on VT-switching, but that has always been awkward.

emersion commented 4 years ago

One issue is that it's not possible for DRM to implement a wl_surface model, because once you've submitted a frame you need to wait vsync before submitting another one. That's why frame_pending exists: to prevent users from pushing multiple frames per vblank so that all backends work the same way.

We could work-around this by making the DRM backend wait on wlr_output_commit and submit the frame right before vblank, however I'm not sure how this would work with variable refresh rate and how to make sure we don't miss a frame.

needs_frame allows the output to request a frame event if there's no damage (so the compositor would skip rendering normally). This is useful in a handful of situations, from VT switching to screen capture. I don't have an idea to remove it (yet).

idle_frame is tied to wlr_output_schedule_frame, so it would go away with this function.

ascent12 commented 4 years ago

We could work-around this by making the DRM backend wait on wlr_output_commit and submit the frame right before vblank, however I'm not sure how this would work with variable refresh rate and how to make sure we don't miss a frame

Yeah, that's a path full of a bunch of heuristics, but would allow us to emulate the mailbox model. When I heard the XDC talk mentioning this, I've been thinking that maybe it's not the worst idea.

But Keith Packard has also mentioned that he wants to try and push for some kernel API changes regarding this too, which would make doing that unnecessary. We'll have to see if anything ends up happening.

emersion commented 4 years ago

Note: https://github.com/swaywm/sway/pull/4772 now sets frame_pending instead of using block_idle_frame.

emersion commented 4 years ago

Update: with https://github.com/swaywm/wlroots/pull/2046 wlr_output_impl.schedule_frame is gone. This means wlroots won't perform page-flips behind the compositor's back anymore.

beviu commented 4 years ago

Could wlroots provide an object (say a wlr_frame_scheduler) that listens for a new frame_request signal emitted by wlr_output_damage and wlr_screencopy_v1, and that emits a frame signal with a wlr_outputs when it thinks is a good time to render a frame for that output?

The algorithm would be configurable:

It could be extended in the future to add a mode like max_render_time in Sway for the compositor and clients where it intentionally waits a fixed time before rendering.
It could also have an automatic mode that measures the rendering time of the compositor and clients and automatically and adjusts the max_render_time (I don't know if this actually works). Of course, the documentation would clearly say that this mode is not perfect.
It could be configured by clients with an extension to the Wayland protocol (see last before paragraph).

I am aware such an algorithm would be opinionated, and therefore that it wouldn't work for every compositor, but the big advantage is that it would provide a default algorithm that all compositors can easily use with very few lines of code and that would help decrease the input latency on all of the Linux desktop that uses wlroots, instead of just Sway for now, especially if a mode is added later that keeps track of the maximum render times automatically and doesn't need any configuration from the end user (I don't know if this actually works).

Additionally, if this was put in wlroots, it could mean that a Wayland protocol extension that works on all wlroots compositors could be created where clients tell the compositor how much time they need to render their content, like the max_render_time option for views in Sway. Indeed, the clients probably know better than the user and the compositor how much time they need to render something, especially when it varies over time. For example, a terminal application might have an algorithm where depending on how many cells need to be redrawn, it will allocate different amounts of time to the compositor to still have time to render, but render at the very last time.

If the compositor wants its own implementation of frame scheduling, it could listen itself for the frame_request signals and do the frame rendering itself.

YaLTeR commented 4 years ago

Sorry, I'm a bit out of the loop so I can't comment on the main part of the comment, but:

clients tell the compositor how much time they need to render their content, like the max_render_time option for views in Sway

I don't think this is a good idea since it has a lag of at least 1 frame; I think a better idea would be to signal the clients the commit deadline together with frame callbacks without delaying the frame callbacks, so then the client itself can decide whether it wants to take all the available time or wait on its own to start rendering closer to the commit deadline.

emersion commented 4 years ago

I think an optional helper would be a good idea, but I'd prefer to have it working in a compositor before deciding whether it should be part of wlroots or not.

The helper should listen to presentation events instead of frame events. The backend doesn't send frame events when a page-flip happens, the backend sends a frame event to signal it's a good time to render (and it's wired up to wl_surface.frame in the Wayland backend for instance).

ishitatsuyuki commented 3 years ago

I'm still new to this, but I'm kinda skeptical of the idea to let the client decide when they start rendering. Here are the reasons:

Determining how much work is needed for the next frame is hard, as the content rendered by the application mostly depends on user input or other unpredictable events. To decide on a render start time in such cases, the application would need to constantly looking at the events (as opposed to batch-processing them at the beginning of rendering loop), which doesn't sound like a good choice mainly due to power consumption/efficiency.
All clients should start rendering at the same time for a perfectly synchronized frame. A perfect synchronization means:
- when not resizing, the cursor position should match what the application had received when it started rendering
- when resizing, the cursor position should match the edge of the window being resized, and the application should also render at the size requested by the compositor (also see smooth-resize test) That said I guess this one would be hard to observe though, so it can be traded off for other benefits.

By the way, an application that is known to do work during the idle period between frames is Chrome (https://v8.dev/blog/free-garbage-collection), but it calculates the amount of free time after a frame (which can be calculated by subtracting frame time from the monitor refresh period), in contrast to the time before a frame (which is hard to determine as described in point 1 above). So for that case it will work without the knowledge of render deadline.

YaLTeR commented 3 years ago

Determining how much work is needed for the next frame is hard, as the content rendered by the application mostly depends on user input or other unpredictable events. To decide on a render start time in such cases, the application would need to constantly looking at the events (as opposed to batch-processing them at the beginning of rendering loop), which doesn't sound like a good choice mainly due to power consumption/efficiency.

It's hard, but an application surely knows more about what it's about to render than a compositor.

I don't expect many applications bother though, so it's probably the best idea to have that option (as in, not block that option from developing later on) and try to adjust it automatically in the compositor for the rest of applications.

All clients should start rendering at the same time for a perfectly synchronized frame. A perfect synchronization means:

when not resizing, the cursor position should match what the application had received when it started rendering

when resizing, the cursor position should match the edge of the window being resized, and the application should also render at the size requested by the compositor (also see smooth-resize test) That said I guess this one would be hard to observe though, so it can be traded off for other benefits.

I don't see how either of these points depends on all clients starting rendering at the same time? I mean, if you set max_render_time for a particular client today then it will start rendering at a different time from other clients, does that cause any issues?

ishitatsuyuki commented 3 years ago

I don't see how either of these points depends on all clients starting rendering at the same time? I mean, if you set max_render_time for a particular client today then it will start rendering at a different time from other clients, does that cause any issues?

The idea is to have a consistent snapshot view of things like cursor positions among all clients as well as the compositor. So I want to avoid cases like app A thinks your cursor position is that of -10ms from now, app B thinks it's that of -5ms from now, and the compositor has something like -1ms from now. That said, it rarely matters since it's only the focused window that receives events.

The examples I mentioned in previous comment is primarily about syncing compositor with the client (so it's not actually client-client sync). We do want to avoid cases where the cursor is running ahead or behind the edge position of the app that is currently resized.