libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10.1k stars 1.81k forks source link

[Video/Wayland] Explicitly waiting for eglSwapBuffers() to complete. #14409

Open vanfanel opened 2 years ago

vanfanel commented 2 years ago

Description

Currently, RetroArch will show noticeable input lag when using the OpenGL backend on the Wayland context (the max_swapchain setting is not available on Wayland + OpenGL).

Buffer swapping is done here for Wayland: https://github.com/libretro/RetroArch/blob/911308327dc1f06531575f9f606b21a0a25ac38a/gfx/drivers_context/wayland_ctx.c#L518 ...but it lacks a mean to block until eglSwapBuffers() completes. eglSwapBuffers() is by default a synchronous/blocking function: in theory it doesn't return until the requested buffer swap is done. This behavior can be changed with eglSwapInterval(1), which make subsequent eglSwapBuffers() calls return immediately.

However, blocking eglSwapBuffers() as things go internally on Wayland means "you can send a new frame", not "the issued buffer swap is complete and the new contents are on screen". That's why waiting for the buffer swap event "manually" after eglSwapBuffers() can be a good idea.

In the KMS/DRM backend of SDL2, I use events to be notified on the buffer swap completion: https://github.com/libsdl-org/SDL/blob/5b2884cb0203cc63bf9753f8b55ea4c6c6f19cfb/src/video/kmsdrm/SDL_kmsdrmvideo.c#L391

So, any idea on what would be the equivalent in Wayland for explicitly blocking until the requested buffer swap is completed?

Expected behavior

RetroArch on Wayland using OpenGL should have less input lag.

Actual behavior

RetroArch on Wayland using OpenGL has noticeable input lag.

Steps to reproduce the bug

  1. Run RetroArch on Wayland with the OpenGL driver.
  2. You get input lag.

Bisect Results

Has always happened.

Version/Commit

Every version or RetroArch has this problem: as of today, there's no mechanism implemented to explicitly wait for vsync after the egl_swap_buffers() call here: https://github.com/libretro/RetroArch/blob/911308327dc1f06531575f9f606b21a0a25ac38a/gfx/drivers_context/wayland_ctx.c#L514

Environment information

@Themaister Can you please give me some input here? Are there Wayland-specific mechanisms to force wait for completion after an eglSwapBuffers() call?

vanfanel commented 2 years ago

This piece of code DOES wait for actual buffer swap event:


   struct wl_callback *callback;
   int frame_done;

   frame_done = 0;

   callback = wl_surface_frame(wl->surface);
   if (callback == NULL)
       return;

   // Issue buffer swap.
   egl_swap_buffers(&wl->egl);

   // The callback will set frame_done to true when receiving event.
   wl_callback_add_listener(callback, &frame_listener, &frame_done);

   // Stay in loop until the issued buffer swap is actually done.
   while (!frame_done && wl_display_dispatch(wl->input.dpy) == 0) {}

...and this is the callback function implementation, which should go before trying to pass it to wl_callback_add_listener(), obviously.

static void
frame_callback(void *data, struct wl_callback *callback, uint32_t serial)
{       
        int *done = data;

        *done = 1;
        wl_callback_destroy(callback);
}

static const struct wl_callback_listener frame_listener = {
        frame_callback
};

Whether improves input latency on Wayland + OpenGL is something I cannot be totally sure about.
It should work with VSYNC=OFF since this needs non-blocking eglSwapBuffers(). because we are explicitly waiting for the buffer swapping completion event.

nfp0 commented 1 year ago

I was going to check the branch you asked me to test on the forums, but it is gone now.

vanfanel commented 1 year ago

@nfp0 Yes, I did my own tests and there was no improvement to be seen. Let's simply wait for Vulkan to be fixed on Wayland.

nfp0 commented 1 year ago

Sigh... yeah. I'm anxiously waiting for this: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12086

But it's taking so long. 😩

vanfanel commented 1 year ago

@nfp0 Same here... Do you think that PR is forgotten or something? Wayland is supposed to be the future, having massive input lag with it is not a good sign.

nfp0 commented 1 year ago

True. Maybe we should ping the PR asking for how progress is going, if any?

I have opened this issue a few months ago: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6249

This prompted some discussion on the issue. @Themaister chimed in too back then. It sounded to me that some of the Mesa devs either thought this was a RetroArch issue or that it was not important enough for it to be a priority.

vanfanel commented 1 year ago

@nfp0 I ping-ed the issue. Let's hope they merge it. If 2 buffers could be forced in the driver somehow...

nfp0 commented 1 year ago

@vanfanel Thanks. Let's hope they wake up.

If 2 buffers could be forced in the driver somehow...

Well, I've forced it to 2 on RetroArch and that indeed gave me the lowest possible input-lag. Here's the change I did: https://github.com/libretro/RetroArch/pull/13823 With that I arrived at the same input lag values as KMS and AMDVLK.

I would use it like that, but I remember I had an issue in fullscreen mode. But that might've been an unrelated issue. Give it a try and see if you find any issues with it. I'll have to give it a try again too.

vanfanel commented 1 year ago

@nfp0 Thanks for the patch! Indeed, it works for forcing Vulkan to give us the specified number of buffers: with that, RetroArch on Wayland AT LONG LAST says that it's allocating 2 buffers if max_swapchain is 2. However, did you measure input lag? I don't have instruments at hand now (ie: fast camera + led-powered pad) but I feel like lag is still WAY higher on Wayland :( Super Mario World is perfect for feeling it.

nfp0 commented 1 year ago

@vanfanel Nice! I did measure it, yes. I arrived at the same input lag as KMS and AMDVLK from my measurement posts on the forums, which is the minimum possible latency on my setup (with frame delay disabled).

My setup was a Manjaro KDE Wayland system with RetroArch running fullscreen. I filmed my finger pressing the button on my wired keyboard at 480 fps with my OnePlus 7 Pro, which is more than enough to count the time between press and reaction on the monitor. Not the most scientific, I know, but it's good enough to count how many frames are being buffered on the PC. I did the tests with the bsnes-mercury core running the Horiz/Vert Stripes test on the 240p Test Suit, because that specific test has next-frame reaction to the input and that makes counting frames easier.

nfp0 commented 1 year ago

Super Mario World is perfect for feeling it.

I wouldn't go by our human feelings for this because we're already working with extremely low levels of input lag. I'm usually very picky about input lag and feel it in places where most people don't notice it at all, but I gotta admit I can't really percept the difference between 2 and 3 swapchain images. For perspective, New Super Mario Bros U and Smash Ultimate on the Switch have 5 or 6 frames of input lag.

But of course, perception is not important here. Lower is always better, and allows us to react faster.

vanfanel commented 1 year ago

@nfp0 Did you measure swapchain=2 vs swapchain=3? I mean, with your equipment.

nfp0 commented 1 year ago

No, I did not measure swapchain=3. But I can do that to make sure.

vanfanel commented 1 year ago

No, I did not measure swapchain=3. But I can do that to make sure.

Yes please, measure swapchain=3 and swapchain=2 and tell me what difference you see.

nfp0 commented 1 year ago

Sure thing. I'll try to get to it in the next few days.

nfp0 commented 1 year ago

@vanfanel I'm back with some numbers. Numbers that confirm swapchain=2 indeed reduces input lag by one frame. :slightly_smiling_face:

I did 10 random measurements for each scenario and averaged the value. Here are the values I've arrived at:

That's a 15ms (almost 1 frame at 60fps, the expected) difference between 2 and 3 swapchain images. 3 and 4 swapchain images have the same input lag and the difference is within margin of error, but I remember reading something on @Themaister's blog about Mesa reporting 4 images and then only using 3. If you want to validate my measurements I can upload the slow-motion vídeos to Youtube or something.

Another good new is that, from my limited testing, I found no bugs or issues while forcing the 2 swapchain images. Last time I tried this I remember having some trouble with RetroArch being frozen when I opened a game and only coming back to normal if I alt-tabbed out and then back in to RetroArch. But I don't seem to have that issue anymore. Maybe it was just a Kwin bug.

I wish we could add a feature to force the number of swapchain images, even if just as a command-line parameter. But for now, until I find any issues, I'll use it patched to force 2 swapchain images.

vanfanel commented 1 year ago

@nfp0 Great! Thanks for these numbers and experiments! Some questions arise in my mind: -Did you do these tests with "Threaded video" disabled? (Enabling it increases the output lag by a LOT!) -Can you please do the same tests on the TTY? (=No wayland) -Can you please do the same tests in OpenGL on Wayland? (OpenGL on Wayland won't let you chose the number of buffers, sadly).

nfp0 commented 1 year ago

No problem! :slightly_smiling_face:

-Did you do these tests with "Threaded video" disabled? (Enabling it increases the output lag by a LOT!)

Threaded vídeo is disabled. I've never used it to be honest.

-Can you please do the same tests on the TTY? (=No wayland)

RetroArch already uses only 2 swapchain images on KMS on the TTY, as can be seen on the console output. I've tested it back when I posted my results on the forums and already achieved the lowest possible theoretical input lag (same as Windows exclusive fullscreen). Since it already uses 2 images, my patch doesn't change anything there.

Out of curiosity, has anyone ever claimed an input lag lower than 50ms on RetroArch on any system ever? (Without using frame delay and run-ahead, of course).

-Can you please do the same tests in OpenGL on Wayland? (OpenGL on Wayland won't let you chose the number of buffers, sadly).

I don't plan on using OpenGL, but I can give it a try yeah. I assume you want Hard GPU Sync on? And do you want normal GL or GLCore?

vanfanel commented 1 year ago

Out of curiosity, has anyone ever claimed an input lag lower than 50ms on RetroArch on any system ever? (Without using frame delay and run-ahead, of course).

Not that I know. I don't use these, either. No need if having 2 buffers and NO threaded video.

I don't plan on using OpenGL, but I can give it a try yeah. I assume you want Hard GPU Sync on? And do you want normal GL or GLCore?

Can I have both, please? :)

nfp0 commented 1 year ago

Not that I know. I don't use these, either. No need if having 2 buffers and NO threaded video.

Well, there's always benefits in using them. Frame delay can shave up to almost 16ms if your PC is fast enough.

Can I have both, please? :)

Wait, I've remembered now that my patch only affects Vulkan. as it's an edit of gfx/common/vulkan_common.c, so there will be absolutely no difference on OpenGL. Do you still want me to test it?

vanfanel commented 1 year ago

@nfp0 Yes, please, I would like to see the lag you get for OpenGL on Wayland. I got around 65ms, which is a bit high, but then again it seems impossible to effectively control the number of buffers EGL uses (and blocking after eglSwapBuffers() shows no difference as I said). So I would like to see what you get.

nfp0 commented 1 year ago

Aight! I'll get back to you.

nfp0 commented 1 year ago

@vanfanel Sorry for the delay! I've been a bit busy, and these tests need to be done with daylight because of the slow-motion camera shutter speed and then I gotta count all the frames manually.

Ok, so, I got these numbers with the same metodology as before (10 samples):

gl: 60ms glcore: 69ms

Keep in mind there's a 16ms margin of error, depending on when my finger presses the button in relation to the Vsync interval, so the 9ms difference between them might, or might not, be real. Regardless, it seems they're slower than 2 swapchain images from Vukan, but about the same as 3 swapchain images.

So long story short, let's use Vulkan if possible. :slightly_smiling_face:

EDIT: Mind you this was on RetroArch 1.11.1

vanfanel commented 1 year ago

@nfp0 Thanks, really, thanks. These results confirm my feelings: Vulkan + 2 swapchains is the way to go, period. Too bad EGL API doesn't allow a way to specify the number of buffers. But well, we have Vulkan, so it's OK.

nfp0 commented 1 year ago

No problem! Mind you this was on a vanilla build of RetroArch. I did not apply your eglSwapBuffers() change.

Yeah, Vulkan is the way to go for low latency. Now I just wish Mesa would hurry up, because for now, only a hacked RetroArch uses 2 swapchain images. Think we can convince libretro to have a way to force the swapchain? Even if just a command line option to keep it away from the uninformed user?

vanfanel commented 1 year ago

@nfp0 Well, making an option or commandline would be redundant I think: things work, we should simply get rid of asking Vulkan about the number of buffers. However, your patch did precisely that and it was rejected. So I don't know what to say. It's a pretty absurd situation if you ask me.

nfp0 commented 1 year ago

Because the bug is on Mesa's side. Forcing it on RetroArch works for now on our machines, but as @Themaister said, it's an invalid usage of Vulkan, which means it might not work on other setups or it might break at any update in the future. That's why I agree that it must not be available as a normal option, but I believe it would be very useful as a "force" option to work around the problem.

We don't know how many years Mesa will take to solve this, and meanwhile distros are starting to default to Wayland so we'll have more and more users with subpar input lag on RetroArch.

vanfanel commented 1 year ago

We don't know how many years Mesa will take to solve this, and meanwhile distros are starting to default to Wayland so we'll have more and more users with subpar input lag on RetroArch.

Problem is, most users don't care. They never played the games on real hw so they don't know how they should feel/respond. So we are a bit out of luck.

nfp0 commented 1 year ago

That's true unfortunately. And to top it off, the default swapchain nr is 3.

But anyway, do you know who we should talk to ask if they're ok with having this setting as a workaround?

vanfanel commented 1 year ago

@nfp0 I don't know people on the RetroArch organization, so I don't know who should we talk to.

Our best bet is getting @Themaister fix it on Mesa's side. Maybe he can read us?

nfp0 commented 1 year ago

I hope so. I'll see if I can raise some attention to this again on Discord.

nfp0 commented 1 year ago

@vanfanel Could this be what we were waiting for? Themaister has been working on this but I can't tell for sure if this is going to help with the swapchain issue or not. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19279

vanfanel commented 1 year ago

@nfp0 Could be, yes! But I am not sure.. I don't know MESA so well under the hood. Let's wait and see what comes out of that, @Themaister is a genius: he invented the best gaming API ever and now he's on to fix the input lag problems in Vulkan on Wayland!

nfp0 commented 1 year ago

He sure is!

the best gaming API ever

Which one are you referring to?

Vulkan on Wayland

Actually, this problem affects both X and Wayland.

vanfanel commented 1 year ago

@nfp0 I was referring to the LibRetro API, which is TheMaister's creation.

nfp0 commented 1 year ago

LibRetro API

Oh wow! I had no idea that was also his creation. Amazing! :open_mouth:

gouchi commented 1 month ago

@vanfanel Is it still an issue or can we close this issue ? Thank you.

vanfanel commented 1 month ago

Vulkan works as expected by now, and I added a way to block until frame swapping is done in OpenGL, but I needed that the Max Swapchain images option is displayed in Wayland, but it never happened.

gouchi commented 1 month ago

@vanfanel I see that VK_KHR_present_wait has been implemented.

vanfanel commented 1 month ago

@vanfanel I see that VK_KHR_present_wait has been implemented.

Does RetroArch use that? I recall It was doing some workaround...

gouchi commented 1 month ago

Does RetroArch use that? I recall It was doing some workaround...

I don't know.