Closed bootc closed 11 months ago
That's definitely not how its supposed to look. Looks fine on my Linux system with both X11 and Wayland (GNOME), see screenshot (under X11).
I'm afraid I am not going to install a Debian VM if you can come up with a simpler reproducer I will take a look. But I suspect this is something system or distro package specific. You can try using the official kitty binaries instead of the debian package.
I'm not contesting that it's not a kitty bug, but I wanted to add some data points:
That would be an OpenGL driver issue in the VM in that case.
I've rolled back to Kitty 0.26.5-5 from Debian stable, and this also resolves the issue. Upgrading to 0.31.0-3 again brings the issue back.
I'll try to build my own Kitty and bisect the problem, but I think it's clear that the bug is most likely in Kitty rather than the OpenGL driver or the rest of the stack. Please reopen the issue.
I need a non-VM way to reproduce the issue. If you can find one I will look into it. As it is, its not at all clear to me its a bug in kitty.
So I've narrowed this to something between 0.27.1 and 0.28.0. I can reproduce the problems with the binary release builds attached to the GitHub releases. I'll try to git bisect now.
git bisect has narrowed it down to 8ece8957741b91c022f76842ccbcfb0f7913945a. Reverting this commit on top of 0.28.1 produces a working build.
This commit doesn't revert cleanly on newer versions, but I can apply the following patch on 0.31.0 to make things work again:
diff --git a/kitty/cell_vertex.glsl b/kitty/cell_vertex.glsl
index 71086e255787..842c3c744841 100644
--- a/kitty/cell_vertex.glsl
+++ b/kitty/cell_vertex.glsl
@@ -68,7 +68,7 @@ vec3 color_to_vec(uint c) {
r = (c >> 16) & BYTE_MASK;
g = (c >> 8) & BYTE_MASK;
b = c & BYTE_MASK;
- return vec3(gamma_lut[r], gamma_lut[g], gamma_lut[b]);
+ return vec3(float(r) / 255.0, float(g) / 255.0, float(b) / 255.0);
}
uint resolve_color(uint c, uint defval) {
Yeah, like I said, a driver bug. For whatever reason on the GPU driver in your VM the gamma_lut uniform is not being set correctly. Since it is set correctly on every non VM GPU driver...
the gamma_lut is a constant table defined in srgb_gamma.h and it is sent to the GPU in init_cell_program() in shaders.c
I am guessing on your GPU it is either random or zeroed. leading to incorrect color values.
You can try running with --debug-rendering if the OpenGL driver reports some error it will be printed to standard out but I highly doubt it will.
I'm not seeing anything that looks like an error:
$ kitty --debug-rendering
Creating window at size: 2984x1615 and scale 2
GL version string: '4.0 (Core Profile) Mesa 23.2.1-1' Detected version: 4.0
CSD: old.size: 0x0 new.size: 2984x1615 needs_update: 1 size_changed: 1 buffer_destroyed: 0
Created decoration buffers at scale: 2 vertical_height: 1651 horizontal_width: 3008
top-level configure event: size: 0x0 states:
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 0 size_changed: 0 buffer_destroyed: 0
final window content size: 2984x1615 resized: 0
Setting window geometry in configure event: x=0 y=-24 2984x1639
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1651 horizontal_width: 3008
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 0 size_changed: 0 buffer_destroyed: 0
Waiting for swap to commit: swap has happened
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
top-level configure event: size: 3008x1663 states: TOPLEVEL_STATE_MAXIMIZED
Resizing framebuffer to: 3008x1639 at scale: 2
CSD: old.size: 2984x1615 new.size: 3008x1639 needs_update: 1 size_changed: 1 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
final window content size: 3008x1639 resized: 1
Setting window geometry in configure event: x=0 y=-24 3008x1663
top-level configure event: size: 3008x1663 states: TOPLEVEL_STATE_MAXIMIZED TOPLEVEL_STATE_ACTIVATED
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 0
final window content size: 3008x1639 resized: 0
Setting window geometry in configure event: x=0 y=-24 3008x1663
Scale changed to 2 in surface enter event
Resizing framebuffer to: 3008x1639 at scale: 2
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
Waiting for swap to commit: swap has happened
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
prompt_marking: x=0 y=0 op='k;start_kitty'
prompt_marking: x=0 y=0 op='A'
prompt_marking: x=0 y=0 op='k;end_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
prompt_marking: x=13 y=0 op='k;start_suffix_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
prompt_marking: x=13 y=0 op='k;end_suffix_kitty'
prompt_marking: x=0 y=0 op='k;start_kitty'
prompt_marking: x=0 y=0 op='A'
prompt_marking: x=0 y=0 op='k;end_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
prompt_marking: x=13 y=0 op='k;start_suffix_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
prompt_marking: x=13 y=0 op='k;end_suffix_kitty'
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
So if it is indeed the GL driver, where do I go to debug this issue? Is there any way I can reproduce this outside Kitty, would you say? The weird thing is the colours look fine everywhere else, in the desktop, in browsers, using glxgears
, and so on - it would be nice if I could find a reproducer outside Kitty so whomever else I go to doesn't just point back at Kitty.
Also, I know nothing about OpenGL, but I don't think the LUT can be random or just zero. The colours are consistently dark, even between reboots. If it was all zeros I'd expect all black, and if it was random or weird memory contents I'd expect very strange graphics; this looks fine but dark. It's almost as if the gamma translation (that's what this is, right?) is being applied twice, thus making everything consistently darker than it should be. White is white, black is black, but things in between are wrong.
You would need to write an OpenGL application that uses a shader similar to cell_vertex.glsl with the gamma_lut uniform.
But there should be no need, this is pretty clearly a bug in the driver, there have been other such bugs in the VM GPU drivers, for example: https://github.com/kovidgoyal/kitty/issues/5395
You could gather an API trace as in that issue and use that in your upstream bug report.
random doesn't mean actually random, it means uninitialized memory. Which is usually the memory left after from something else used it, which can often be a set of fixed values.
And again, if the gamma correction was being applied twice, or any such bug, it would be applied twice everywhere not just in the VM. There is no codepath in kitty code that says "if running on XYZ GPU driver do ABC instead of DEF"
But, this is very easy to test, look at gen/srg_lut.py to see how the lookup table is generated and implement that function in the shader and use it instead of the lookup table.
But that's exactly the point, isn't it? When I patched Kitty to not use the lookup table and instead use a linear conversion, it was working fine. It's when the gamma "correction" is applied that things are off.
Also if the gamma table comes from the generated header file, where does uninitialised memory come in? And what's the chance that said uninitialised memory contains a nice table of 256 float values that just happen to produce this result? And that, in between reboots, and Kitty restarts, and different versions of Mesa or DRI (I tried both older and newer Mesa and DRI). The behaviour is extremely consistent.
Finally, if I apply this patch to gen/srgb_lut.py
and re-generate the table, things also look perfect:
diff --git a/gen/srgb_lut.py b/gen/srgb_lut.py
index 54050bb8484b..82cb0e0e6dc9 100755
--- a/gen/srgb_lut.py
+++ b/gen/srgb_lut.py
@@ -13,6 +13,7 @@
def to_linear(a: float) -> float:
+ return a
if a <= 0.04045:
return a / 12.92
else:
So something is applying the gamma translation twice somewhere, and presumably something (maybe Kitty, maybe in the OpenGL stack somewhere) should not be.
I've also just noticed this warning when launching Kitty, could this have something to do with it?
[333 16:51:43.009992] WARNING: Your system's OpenGL implementation does not have glCopyImageSubData, falling back to a slower implementation
On Wed, Nov 29, 2023 at 08:50:50AM -0800, Chris Boot wrote:
But that's exactly the point, isn't it? When I patched Kitty to not use the lookup table and instead use a linear conversion, it was working fine. It's when the gamma "correction" is applied that things are off.
yes, which would be because the gamma correction table is incorrect.
Also if the gamma table comes from the generated header file, where does uninitialised memory come in?
On the GPU.
And what's the chance that said uninitialised memory contains a nice table of 256 float values that just happen to produce this result? And that, in between reboots, and Kitty restarts, and different versions of Mesa or DRI (I tried both older and newer Mesa and DRI). The behaviour is extremely consistent.
Function call A uses the memory sets it to some values. Then frees it. Immediately after function call B allocates the memory without initializing it. B will always see the output of A. Consistently every time.
Finally, if I apply this patch to
gen/srgb_lut.py
and re-generate the table, things also look perfect:diff --git a/gen/srgb_lut.py b/gen/srgb_lut.py index 54050bb8484b..82cb0e0e6dc9 100755 --- a/gen/srgb_lut.py +++ b/gen/srgb_lut.py @@ -13,6 +13,7 @@ def to_linear(a: float) -> float: + return a if a <= 0.04045: return a / 12.92 else:
So something is applying the gamma translation twice somewhere, and presumably something (maybe Kitty, maybe in the OpenGL stack somewhere) should not be.
Umm, sure figure out what it is and send a patch, one that does not change rendering outside the VM. As far as I can see nothing is.
On Wed, Nov 29, 2023 at 08:53:29AM -0800, Chris Boot wrote:
I've also just noticed this warning when launching Kitty, could this have something to do with it?
[333 16:51:43.009992] WARNING: Your system's OpenGL implementation does not have glCopyImageSubData, falling back to a slower implementation
No, that is involved in updating the sprite map. sprites dont affect background colors.
Another idea: it might be that the openGL driver in your VM is not respecting GL_FRAMEBUFFER_SRGB that would be why using non linearized colors gives you the correct output. It may be that the VM GPU driver is using a non-SRGB output buffer. See https://www.khronos.org/opengl/wiki/Framebuffer
Another idea: it might be that the openGL driver in your VM is not respecting GL_FRAMEBUFFER_SRGB that would be why using non linearized colors gives you the correct output. It may be that the VM GPU driver is using a non-SRGB output buffer. See https://www.khronos.org/opengl/wiki/Framebuffer
See if https://github.com/kovidgoyal/kitty/commit/97f5cad3352ee38588b8c5e81988e239bba58a64
fixes it. You can also query the srgb status of the output buffer explicitly to check, with GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING
Thanks for persisting with this! I had some trouble building Kitty from master
; I'm on arm64 linux and the is_arm
platform check didn't trigger, so I had to apply this patch:
diff --git a/setup.py b/setup.py
index e873cf9697d8..64bc028748bd 100755
--- a/setup.py
+++ b/setup.py
@@ -49,7 +49,7 @@
is_netbsd = 'netbsd' in _plat
is_dragonflybsd = 'dragonfly' in _plat
is_bsd = is_freebsd or is_netbsd or is_dragonflybsd or is_openbsd
-is_arm = platform.processor() == 'arm' or platform.machine() == 'arm64'
+is_arm = platform.processor() == 'arm' or platform.machine() in ['arm64', 'aarch64']
Env = glfw.Env
env = Env()
PKGCONFIG = os.environ.get('PKGCONFIG_EXE', 'pkg-config')
I then found that normal "release" builds of Kitty just segfault. I built a debug build instead and that worked, but the issue persists. I saw that the latest commit (ad4e9bb42c15c308b9c56968dd2e79a430738526) should also test that the encoding is sRGB and log an error if it isn't - and that warning is not printed. So it does look like the driver should know what to do with it, but doesn't.
The crash in the release build seems to happen in the launcher itself, and my gdb suggests when backtracing that the stack is corrupt. Building with 788295e534785fe7c1a83ea05a9aaeb0950ab2a7 reverted produces a working build (but the colours are still wrong).
So I think, to summarise, Kitty is creating a window/surface/whatever you call it that is set up for SRGB colorspace, and the driver or something upstream from it isn't actually taking that into account properly, and lying to Kitty about it when it enquires.
On Fri, Dec 01, 2023 at 02:56:45AM -0800, Chris Boot wrote:
Thanks for persisting with this!
You are welcome, I dislike things I dont understand :)
I had some trouble building Kitty from
master
; I'm on arm64 linux and theis_arm
platform check didn't trigger, so I had to apply this patch:
These issues should now be fixed.
So I think, to summarise, Kitty is creating a window/surface/whatever you call it that is set up for SRGB colorspace, and the driver or something upstream from it isn't actually taking that into account properly, and lying to Kitty about it when it enquires.
Hmm, pity.
I have created the following issue report to follow this up in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057195
Also I thought you might be interested: I installed a macOS VM and ran Kitty inside that. Kitty fails to start:
bootc@Chriss-Virtual-Machine ~ % /Applications/kitty.app/Contents/MacOS/kitty
[335 14:22:04.426971] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432092] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432239] Failed to create GLFW temp window! This usually happens because of old/broken OpenGL drivers. kitty requires working OpenGL 3.3 drivers.
I can also replicate the colour problem with a Fedora 39 VM.
On Fri, Dec 01, 2023 at 06:24:40AM -0800, Chris Boot wrote:
I have created the following issue report to follow this up in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057195
Also I thought you might be interested: I installed a macOS VM and ran Kitty inside that. Kitty fails to start:
***@***.*** ~ % /Applications/kitty.app/Contents/MacOS/kitty [335 14:22:04.426971] [glfw error 65545]: NSGL: Failed to find a suitable pixel format [335 14:22:04.432092] [glfw error 65545]: NSGL: Failed to find a suitable pixel format [335 14:22:04.432239] Failed to create GLFW temp window! This usually happens because of old/broken OpenGL drivers. kitty requires working OpenGL 3.3 drivers.
That will probably be because of the glfw srgb request, does commenting it out allow it to start? And what VM software are we talking about? Whats the Host OS, the hypervisor and the guest OS
--
Dr. Kovid Goyal https://www.kovidgoyal.net https://calibre-ebook.com
Oh, I can replicate the colour problem with glxgears -srgb
! So definitely not Kitty.
The host is: macOS Sonoma 14.2 Beta (23C5055b) The hypervisor is: Parallels Desktop 19.1.1 (54734)
The guest OS has so far been Debian testing/trixie, but I replicated the issue in Fedora 38 and 39.
I'm not sure I have the energy to build Kitty in the macOS VM, I just did that as a quick experiment and have removed it now.
On Fri, Dec 01, 2023 at 06:51:12AM -0800, Chris Boot wrote:
Oh, I can replicate the colour problem with
glxgears -srgb
! So definitely not Kitty.
Well, that's good, you can probably use that to report the issue to the developers of Parallels Desktop
I'm not sure I have the energy to build Kitty in the macOS VM, I just did that as a quick experiment and have removed it now.
OK, thanks, anyway.
Just as another data point, I tried this with VMware Fusion and the colours are fine. The graphics stack is very different with that, of course, so I don't know how much of a test that actually was...
This comment is for future visitors who had darker colors issue on Linux ARM:
I have MBP with Apple Silicon. I use Linux ARM on Parallels. I have the very same issue: Very Darker Colors and cannot set background picture. I use Kitty on my MBP M1 without problem. For testing, I made an Linux x86 vm on qemu, in my very same MBP Apple Silicon computer. And kitty runs without problem. Apparently there is a problem on the Parallels side! Hope it helps.
Describe the bug I upgraded from Debian 12 (bookworm) to testing (trixie), and now colours are rendering darker than they should be.
To Reproduce Steps to reproduce the behavior:
Screenshots The expected colours, in macOS:
What I see now in trixie:
I've included a Chromium window with the
color4
blue colour swatch visible to compare with what it should look like; as you can see it's not an overall colour rendering problem in the whole VM, only kitty appears to be affected.Environment details
Additional context The problem is identical when using
kitty --config NONE
.