Closed krakow10 closed 2 months ago
Haven't tried it but note that issue may possibly be still in egl-wayland (need >=1.1.14 to use explicit sync), egl-wayland-1.1.14 is particularly broken, and 1.1.15 still has issues that crash applications (one major issue was fixed, but no new release yet). Alternatively, it could be wlroots' rather new explicit sync implementation that needs work (Edit: one could try with plasma6/kwin instead to see it happens on both).
Note that nvidia official drivers (current latest 560.35.03) still ship egl-wayland-1.1.13.1 and distros giving an higher version are essentially opting in an early experiment (In Gentoo we're currently leaving it as a opt-in, users don't get 1.1.15 by default). Arch seems to have downgraded back to 1.1.13.1 too, have you gotten that update yet or is it somehow failing even with 1.1.13.1? (if the latter, may indicate it's trying to use explicit sync when it can't).
Not to say bundled glfw may not have something that needs fixing too if it's misusing something and causing unexpected issues.
Explicit sync is something that happens in lower layers than application code. If its crashing then the bug is probably in either mesa or the NVIDIA driver or the compositor. An OpenGL application like kitty just calls eglSwapBuffers() there is no concept of implicit/explicit synchronization in this API. So I dont see that anything can be done in kitty about this. If anybody knows any different feel free to enlighten me.
Haven't tried it but note that issue may possibly be still in egl-wayland (need >=1.1.14 to use explicit sync), egl-wayland-1.1.14 is particularly broken, and 1.1.15 still has issues that crash applications (one major issue was fixed, but no new release yet).
egl-wayland is poignant, this was tested with egl-wayland 1.1.15. I found that screen capture doesn't work on any other version but I may have not tested correctly. I may try 1.1.13.1 when I have time.
Edit: The release notes for egl-wayland 1.1.13.1 state that it disables explicit sync, so it is not a valid explicit sync testing target. "...users may want to fall back to 1.1.13 to avoid using explicit sync, but cannot on 560. This release provides an option to fall back to."
You must not call wl_surface_attach on a surface that has an egl context.
On Mon, Aug 26, 2024 at 01:52:39AM -0700, mahkoh wrote:
You must not call wl_surface_attach on a surface that has an egl context.
Is this documented somewhere? Where in the Wayland spec does it say that's not allowed? All Wayland compositors have, to date, been fine with this. And I can find no mention of such a restriction in the docs of wl_surface_attach.
Note, I don't know if the fix actually works, but its the only instance in the codebase I can find of a buffer being attached to a surface that already has an OpenGL context. @krakow10 please test.
Is this documented somewhere? Where in the Wayland spec does it say×that's not allowed? All Wayland compositors have, to date, been fine with this.
Insofar that vulkan says
The native window referred to by pCreateInfo->surface must not become associated with a non-Vulkan graphics API surface before all associated Vulkan swapchains have been destroyed.
And similar for opengl. What exactly this means is not necessarily documented.
Assuming the AUR package installed correctly (installed as kitty-git-1:0.36.1.r16.g9843b5c21-1) I am getting the same error as before.
Run with --debug-rendering that will tell us the sequence of events. And you can use the kitty nightly build no need to rely on AUR. https://sw.kovidgoyal.net/kitty/build/
Here is the output I get.
[quat@quat-desktop ~]$ kitty --config NONE --debug-rendering
[0.065] Compositor missing capabilities: blur
[0.080] Creating window 1 at size: 941x1056 and scale 1
[0.175] GL version string: '3.1.0 NVIDIA 560.35.03' Detected version: 3.1
[0.193] Compositor top-level capabilities: maximize=1 minimize=0 window_menu=1 fullscreen=0
[0.193] XDG top-level configure event for window 1: size: 0x0 states:
[0.193] XDG decoration configure event received for window 1: has_server_side_decorations: 1
[0.193] XDG surface configure event received and acknowledged for window 1
[0.193] Waiting for swap to commit Wayland surface for window: 1
[0.193] Final window 1 content size: 941x1056 resized: 0
[0.193] Setting window 1 "visible area" geometry in configure event: x=0 y=0 941x1056 viewport: 941x1056
[0.193] Attached temp buffer during window 1 creation of size: 941x1056 and rgba(0, 0, 0, 255)
[0.193] Waiting for compositor to send fractional scale for window 1
wp_linux_drm_syncobj_surface_v1#40: error 4: Buffer attached but no acquire point set
[0.216] The output buffer does not support sRGB color encoding, colors will be incorrect.
[0.216] OS Window created
[0.237] Child launched
[0.238] [glfw error 65544]: Wayland: fatal display error: Protocol error
[0.245] Window 1 swapped committing surface
Looks fine to me. We can see that the compositor is complaining after the temp buffer is attached. With my change that occurs before any opengl context is created. So as far as I can see kitty is not doing anything wrong here. If I had to guess, sway now requires more drama while attaching a buffer to a surface. Maybe a call to some function in the syncobj protocol. If so, only they can tell us what that is. Report the issue there.
Run with WAYLAND_DEBUG=1 to get the relevant logs.
Or actually, there may be one more place I overlooked. Hang tight.
Here is the cool wayland log interlaced with --debug-rendering output wayland-debug.log
The logs show the syncobj_surface being created before attaching a single pixel buffer.
Yes, as I said, I missed one place where it can happen. Namely when the OS window is created hidden and then later shown.
And this should take care of it: https://github.com/kovidgoyal/kitty/commit/68401eddbab98f95b15ba54a817ad7f723e0d7d0
It launches now, thanks for the fix!
Has this been pushed to arch depository? I just updated my system and I am seeing the same issue at my side.
Describe the bug Kitty fails to launch when the sway compositor is using explicit synchronization.
To Reproduce Steps to reproduce the behavior:
Screenshots