Colours rendering too dark

bootc commented 11 months ago

Describe the bug I upgraded from Debian 12 (bookworm) to testing (trixie), and now colours are rendering darker than they should be.

To Reproduce Steps to reproduce the behavior:

Install a Debian 12 VM with kitty and your favourite colour theme
Upgrade to trixie
Colours are now too dark

Screenshots The expected colours, in macOS: macos

What I see now in trixie: trixie

I've included a Chromium window with the color4 blue colour swatch visible to compare with what it should look like; as you can see it's not an overall colour rendering problem in the whole VM, only kitty appears to be affected.

Environment details

kitty 0.31.0 created by Kovid Goyal
Linux calcifer-deb 6.5.0-4-arm64 #1 SMP Debian 6.5.10-1 (2023-11-03) aarch64
Debian GNU/Linux trixie/sid calcifer-deb /dev/tty

Running under: Wayland
Frozen: False
Paths:
  kitty: /usr/bin/kitty
  base dir: /usr/lib/kitty
  extensions dir: /usr/lib/kitty/kitty
  system shell: /bin/bash
Loaded config files:
  /home/bootc/.config/kitty/kitty.conf

Config options different from defaults:
active_tab_font_style   (True, False)
allow_remote_control    yes
close_on_child_death    True
font_size               8.0
scrollback_lines        100000
tab_bar_edge            1
tab_bar_margin_width    5.0
tab_bar_min_tabs        1
tab_bar_style           powerline
tab_powerline_style     slanted
term                    xterm-256color
Colors:
        active_tab_background   #0087af   
        active_tab_foreground   #ffffff   
        color1                  #cc0000   
        color10                 #8ae234   
        color11                 #fce94f   
        color12                 #729fcf   
        color13                 #ad7fa8   
        color14                 #34e2e2   
        color15                 #eeeeec   
        color2                  #4e9a06   
        color3                  #c4a000   
        color4                  #3465a4   
        color5                  #75507b   
        color6                  #06989a   
        color7                  #d3d7cf   
        color8                  #555753   
        color9                  #ef2929   
        cursor                  #ffffff   
        foreground              #ffffff   
        inactive_tab_background #585858   
        inactive_tab_foreground #bcbcbc   
        selection_background    #b4d5ff   
        tab_bar_background      #303030   

Important environment variables seen by the kitty process:
        PATH                                /home/bootc/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/opt/puppetlabs/bin
        LANG                                en_GB.UTF-8
        SHELL                               /bin/bash
        DISPLAY                             :0
        WAYLAND_DISPLAY                     wayland-0
        USER                                bootc
        XDG_RUNTIME_DIR                     /run/user/1000
        XDG_CURRENT_DESKTOP                 GNOME
        XDG_DATA_DIRS                       /usr/share/gnome:/usr/local/share/:/usr/share/
        XDG_MENU_PREFIX                     gnome-
        XDG_SESSION_CLASS                   user
        XDG_SESSION_DESKTOP                 gnome
        XDG_SESSION_TYPE                    wayland

This debug output has been copied to the clipboard

Additional context The problem is identical when using kitty --config NONE.

kovidgoyal commented 11 months ago

That's definitely not how its supposed to look. Looks fine on my Linux system with both X11 and Wayland (GNOME), see screenshot (under X11).

I'm afraid I am not going to install a Debian VM if you can come up with a simpler reproducer I will take a look. But I suspect this is something system or distro package specific. You can try using the official kitty binaries instead of the debian package.

bootc commented 11 months ago

I'm not contesting that it's not a kitty bug, but I wanted to add some data points:

The problem is the same when using kitty with the X11 back-end.
If I disable 3D acceleration in the VM, the problem goes away.

kovidgoyal commented 11 months ago

That would be an OpenGL driver issue in the VM in that case.

bootc commented 11 months ago

I've rolled back to Kitty 0.26.5-5 from Debian stable, and this also resolves the issue. Upgrading to 0.31.0-3 again brings the issue back.

I'll try to build my own Kitty and bisect the problem, but I think it's clear that the bug is most likely in Kitty rather than the OpenGL driver or the rest of the stack. Please reopen the issue.

kovidgoyal commented 11 months ago

I need a non-VM way to reproduce the issue. If you can find one I will look into it. As it is, its not at all clear to me its a bug in kitty.

bootc commented 11 months ago

So I've narrowed this to something between 0.27.1 and 0.28.0. I can reproduce the problems with the binary release builds attached to the GitHub releases. I'll try to git bisect now.

bootc commented 11 months ago

git bisect has narrowed it down to 8ece8957741b91c022f76842ccbcfb0f7913945a. Reverting this commit on top of 0.28.1 produces a working build.

This commit doesn't revert cleanly on newer versions, but I can apply the following patch on 0.31.0 to make things work again:

diff --git a/kitty/cell_vertex.glsl b/kitty/cell_vertex.glsl
index 71086e255787..842c3c744841 100644
--- a/kitty/cell_vertex.glsl
+++ b/kitty/cell_vertex.glsl
@@ -68,7 +68,7 @@ vec3 color_to_vec(uint c) {
     r = (c >> 16) & BYTE_MASK;
     g = (c >> 8) & BYTE_MASK;
     b = c & BYTE_MASK;
-    return vec3(gamma_lut[r], gamma_lut[g], gamma_lut[b]);
+    return vec3(float(r) / 255.0, float(g) / 255.0, float(b) / 255.0);
 }

 uint resolve_color(uint c, uint defval) {

kovidgoyal commented 11 months ago

Yeah, like I said, a driver bug. For whatever reason on the GPU driver in your VM the gamma_lut uniform is not being set correctly. Since it is set correctly on every non VM GPU driver...

the gamma_lut is a constant table defined in srgb_gamma.h and it is sent to the GPU in init_cell_program() in shaders.c

I am guessing on your GPU it is either random or zeroed. leading to incorrect color values.

kovidgoyal commented 11 months ago

You can try running with --debug-rendering if the OpenGL driver reports some error it will be printed to standard out but I highly doubt it will.

bootc commented 11 months ago

I'm not seeing anything that looks like an error:

$ kitty --debug-rendering
Creating window at size: 2984x1615 and scale 2
GL version string: '4.0 (Core Profile) Mesa 23.2.1-1' Detected version: 4.0
CSD: old.size: 0x0 new.size: 2984x1615 needs_update: 1 size_changed: 1 buffer_destroyed: 0
Created decoration buffers at scale: 2 vertical_height: 1651 horizontal_width: 3008
top-level configure event: size: 0x0 states: 
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 0 size_changed: 0 buffer_destroyed: 0
final window content size: 2984x1615 resized: 0
Setting window geometry in configure event: x=0 y=-24 2984x1639
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1651 horizontal_width: 3008
CSD: old.size: 2984x1615 new.size: 2984x1615 needs_update: 0 size_changed: 0 buffer_destroyed: 0
Waiting for swap to commit: swap has happened
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
top-level configure event: size: 3008x1663 states: TOPLEVEL_STATE_MAXIMIZED 
Resizing framebuffer to: 3008x1639 at scale: 2
CSD: old.size: 2984x1615 new.size: 3008x1639 needs_update: 1 size_changed: 1 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
final window content size: 3008x1639 resized: 1
Setting window geometry in configure event: x=0 y=-24 3008x1663
top-level configure event: size: 3008x1663 states: TOPLEVEL_STATE_MAXIMIZED TOPLEVEL_STATE_ACTIVATED 
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 0
final window content size: 3008x1639 resized: 0
Setting window geometry in configure event: x=0 y=-24 3008x1663
Scale changed to 2 in surface enter event
Resizing framebuffer to: 3008x1639 at scale: 2
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
Waiting for swap to commit: swap has happened
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0
prompt_marking: x=0 y=0 op='k;start_kitty'
prompt_marking: x=0 y=0 op='A'
prompt_marking: x=0 y=0 op='k;end_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
prompt_marking: x=13 y=0 op='k;start_suffix_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
prompt_marking: x=13 y=0 op='k;end_suffix_kitty'
prompt_marking: x=0 y=0 op='k;start_kitty'
prompt_marking: x=0 y=0 op='A'
prompt_marking: x=0 y=0 op='k;end_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 1 size_changed: 0 buffer_destroyed: 1
Created decoration buffers at scale: 2 vertical_height: 1675 horizontal_width: 3032
prompt_marking: x=13 y=0 op='k;start_suffix_kitty'
CSD: old.size: 3008x1639 new.size: 3008x1639 needs_update: 0 size_changed: 0 buffer_destroyed: 0
prompt_marking: x=13 y=0 op='k;end_suffix_kitty'
Calling wl_pointer_set_cursor in setCursorImage with surface: 0xaaaaf1c542a0

So if it is indeed the GL driver, where do I go to debug this issue? Is there any way I can reproduce this outside Kitty, would you say? The weird thing is the colours look fine everywhere else, in the desktop, in browsers, using glxgears, and so on - it would be nice if I could find a reproducer outside Kitty so whomever else I go to doesn't just point back at Kitty.

bootc commented 11 months ago

Also, I know nothing about OpenGL, but I don't think the LUT can be random or just zero. The colours are consistently dark, even between reboots. If it was all zeros I'd expect all black, and if it was random or weird memory contents I'd expect very strange graphics; this looks fine but dark. It's almost as if the gamma translation (that's what this is, right?) is being applied twice, thus making everything consistently darker than it should be. White is white, black is black, but things in between are wrong.

kovidgoyal commented 11 months ago

You would need to write an OpenGL application that uses a shader similar to cell_vertex.glsl with the gamma_lut uniform.

But there should be no need, this is pretty clearly a bug in the driver, there have been other such bugs in the VM GPU drivers, for example: https://github.com/kovidgoyal/kitty/issues/5395

You could gather an API trace as in that issue and use that in your upstream bug report.

kovidgoyal commented 11 months ago

random doesn't mean actually random, it means uninitialized memory. Which is usually the memory left after from something else used it, which can often be a set of fixed values.

And again, if the gamma correction was being applied twice, or any such bug, it would be applied twice everywhere not just in the VM. There is no codepath in kitty code that says "if running on XYZ GPU driver do ABC instead of DEF"

But, this is very easy to test, look at gen/srg_lut.py to see how the lookup table is generated and implement that function in the shader and use it instead of the lookup table.

bootc commented 11 months ago

But that's exactly the point, isn't it? When I patched Kitty to not use the lookup table and instead use a linear conversion, it was working fine. It's when the gamma "correction" is applied that things are off.

Also if the gamma table comes from the generated header file, where does uninitialised memory come in? And what's the chance that said uninitialised memory contains a nice table of 256 float values that just happen to produce this result? And that, in between reboots, and Kitty restarts, and different versions of Mesa or DRI (I tried both older and newer Mesa and DRI). The behaviour is extremely consistent.

Finally, if I apply this patch to gen/srgb_lut.py and re-generate the table, things also look perfect:

diff --git a/gen/srgb_lut.py b/gen/srgb_lut.py
index 54050bb8484b..82cb0e0e6dc9 100755
--- a/gen/srgb_lut.py
+++ b/gen/srgb_lut.py
@@ -13,6 +13,7 @@

 def to_linear(a: float) -> float:
+    return a
     if a <= 0.04045:
         return a / 12.92
     else:

So something is applying the gamma translation twice somewhere, and presumably something (maybe Kitty, maybe in the OpenGL stack somewhere) should not be.

bootc commented 11 months ago

I've also just noticed this warning when launching Kitty, could this have something to do with it?

[333 16:51:43.009992] WARNING: Your system's OpenGL implementation does not have glCopyImageSubData, falling back to a slower implementation

kovidgoyal commented 11 months ago

On Wed, Nov 29, 2023 at 08:50:50AM -0800, Chris Boot wrote:

But that's exactly the point, isn't it? When I patched Kitty to not use the lookup table and instead use a linear conversion, it was working fine. It's when the gamma "correction" is applied that things are off.

yes, which would be because the gamma correction table is incorrect.

Also if the gamma table comes from the generated header file, where does uninitialised memory come in?

On the GPU.

And what's the chance that said uninitialised memory contains a nice table of 256 float values that just happen to produce this result? And that, in between reboots, and Kitty restarts, and different versions of Mesa or DRI (I tried both older and newer Mesa and DRI). The behaviour is extremely consistent.

Function call A uses the memory sets it to some values. Then frees it. Immediately after function call B allocates the memory without initializing it. B will always see the output of A. Consistently every time.

Finally, if I apply this patch to gen/srgb_lut.py and re-generate the table, things also look perfect:
diff --git a/gen/srgb_lut.py b/gen/srgb_lut.py
index 54050bb8484b..82cb0e0e6dc9 100755
--- a/gen/srgb_lut.py
+++ b/gen/srgb_lut.py
@@ -13,6 +13,7 @@

 def to_linear(a: float) -> float:
+    return a
     if a <= 0.04045:
         return a / 12.92
     else:
So something is applying the gamma translation twice somewhere, and presumably something (maybe Kitty, maybe in the OpenGL stack somewhere) should not be.

Umm, sure figure out what it is and send a patch, one that does not change rendering outside the VM. As far as I can see nothing is.

kovidgoyal commented 11 months ago

On Wed, Nov 29, 2023 at 08:53:29AM -0800, Chris Boot wrote:

I've also just noticed this warning when launching Kitty, could this have something to do with it?
[333 16:51:43.009992] WARNING: Your system's OpenGL implementation does not have glCopyImageSubData, falling back to a slower implementation

No, that is involved in updating the sprite map. sprites dont affect background colors.

kovidgoyal commented 11 months ago

Another idea: it might be that the openGL driver in your VM is not respecting GL_FRAMEBUFFER_SRGB that would be why using non linearized colors gives you the correct output. It may be that the VM GPU driver is using a non-SRGB output buffer. See https://www.khronos.org/opengl/wiki/Framebuffer

kovidgoyal commented 11 months ago

Another idea: it might be that the openGL driver in your VM is not respecting GL_FRAMEBUFFER_SRGB that would be why using non linearized colors gives you the correct output. It may be that the VM GPU driver is using a non-SRGB output buffer. See https://www.khronos.org/opengl/wiki/Framebuffer

See if https://github.com/kovidgoyal/kitty/commit/97f5cad3352ee38588b8c5e81988e239bba58a64

fixes it. You can also query the srgb status of the output buffer explicitly to check, with GL_FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING

bootc commented 11 months ago

Thanks for persisting with this! I had some trouble building Kitty from master; I'm on arm64 linux and the is_arm platform check didn't trigger, so I had to apply this patch:

diff --git a/setup.py b/setup.py
index e873cf9697d8..64bc028748bd 100755
--- a/setup.py
+++ b/setup.py
@@ -49,7 +49,7 @@
 is_netbsd = 'netbsd' in _plat
 is_dragonflybsd = 'dragonfly' in _plat
 is_bsd = is_freebsd or is_netbsd or is_dragonflybsd or is_openbsd
-is_arm = platform.processor() == 'arm' or platform.machine() == 'arm64'
+is_arm = platform.processor() == 'arm' or platform.machine() in ['arm64', 'aarch64']
 Env = glfw.Env
 env = Env()
 PKGCONFIG = os.environ.get('PKGCONFIG_EXE', 'pkg-config')

I then found that normal "release" builds of Kitty just segfault. I built a debug build instead and that worked, but the issue persists. I saw that the latest commit (ad4e9bb42c15c308b9c56968dd2e79a430738526) should also test that the encoding is sRGB and log an error if it isn't - and that warning is not printed. So it does look like the driver should know what to do with it, but doesn't.

The crash in the release build seems to happen in the launcher itself, and my gdb suggests when backtracing that the stack is corrupt. Building with 788295e534785fe7c1a83ea05a9aaeb0950ab2a7 reverted produces a working build (but the colours are still wrong).

So I think, to summarise, Kitty is creating a window/surface/whatever you call it that is set up for SRGB colorspace, and the driver or something upstream from it isn't actually taking that into account properly, and lying to Kitty about it when it enquires.

kovidgoyal commented 11 months ago

On Fri, Dec 01, 2023 at 02:56:45AM -0800, Chris Boot wrote:

Thanks for persisting with this!

You are welcome, I dislike things I dont understand :)

I had some trouble building Kitty from master; I'm on arm64 linux and the is_arm platform check didn't trigger, so I had to apply this patch:

These issues should now be fixed.

So I think, to summarise, Kitty is creating a window/surface/whatever you call it that is set up for SRGB colorspace, and the driver or something upstream from it isn't actually taking that into account properly, and lying to Kitty about it when it enquires.

Hmm, pity.

bootc commented 11 months ago

I have created the following issue report to follow this up in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057195

Also I thought you might be interested: I installed a macOS VM and ran Kitty inside that. Kitty fails to start:

bootc@Chriss-Virtual-Machine ~ % /Applications/kitty.app/Contents/MacOS/kitty
[335 14:22:04.426971] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432092] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432239] Failed to create GLFW temp window! This usually happens because of old/broken OpenGL drivers. kitty requires working OpenGL 3.3 drivers.

I can also replicate the colour problem with a Fedora 39 VM.

kovidgoyal commented 11 months ago

On Fri, Dec 01, 2023 at 06:24:40AM -0800, Chris Boot wrote:

I have created the following issue report to follow this up in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057195

Also I thought you might be interested: I installed a macOS VM and ran Kitty inside that. Kitty fails to start:
***@***.*** ~ % /Applications/kitty.app/Contents/MacOS/kitty
[335 14:22:04.426971] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432092] [glfw error 65545]: NSGL: Failed to find a suitable pixel format
[335 14:22:04.432239] Failed to create GLFW temp window! This usually happens because of old/broken OpenGL drivers. kitty requires working OpenGL 3.3 drivers.

That will probably be because of the glfw srgb request, does commenting it out allow it to start? And what VM software are we talking about? Whats the Host OS, the hypervisor and the guest OS

--

Dr. Kovid Goyal https://www.kovidgoyal.net https://calibre-ebook.com

bootc commented 11 months ago

Oh, I can replicate the colour problem with glxgears -srgb! So definitely not Kitty.

The host is: macOS Sonoma 14.2 Beta (23C5055b) The hypervisor is: Parallels Desktop 19.1.1 (54734)

The guest OS has so far been Debian testing/trixie, but I replicated the issue in Fedora 38 and 39.

I'm not sure I have the energy to build Kitty in the macOS VM, I just did that as a quick experiment and have removed it now.

kovidgoyal commented 11 months ago

On Fri, Dec 01, 2023 at 06:51:12AM -0800, Chris Boot wrote:

Oh, I can replicate the colour problem with glxgears -srgb! So definitely not Kitty.

Well, that's good, you can probably use that to report the issue to the developers of Parallels Desktop

I'm not sure I have the energy to build Kitty in the macOS VM, I just did that as a quick experiment and have removed it now.

OK, thanks, anyway.

bootc commented 11 months ago

Just as another data point, I tried this with VMware Fusion and the colours are fine. The graphics stack is very different with that, of course, so I don't know how much of a test that actually was...

blue-devil commented 5 months ago

This comment is for future visitors who had darker colors issue on Linux ARM:

I have MBP with Apple Silicon. I use Linux ARM on Parallels. I have the very same issue: Very Darker Colors and cannot set background picture. I use Kitty on my MBP M1 without problem. For testing, I made an Linux x86 vm on qemu, in my very same MBP Apple Silicon computer. And kitty runs without problem. Apparently there is a problem on the Parallels side! Hope it helps.

kovidgoyal / kitty

Colours rendering too dark #6845