kovidgoyal / kitty

Cross-platform, fast, feature-rich, GPU based terminal
https://sw.kovidgoyal.net/kitty/
GNU General Public License v3.0
24.15k stars 972 forks source link

Images don't display correctly if you do enough of them #2915

Closed anicolao closed 4 years ago

anicolao commented 4 years ago

Describe the bug Sometimes, after displaying large PNGs, bad data are displayed.

To Reproduce

# start by having displayed a variety of very different PNGs; then:
kitty icat tiles.png
kitty icat bitmap.png
kitty icat tiles.png

Expected behavior The first and last display of tiles should look the same.

Screenshots First display: Screen Shot 2020-08-11 at 11 45 01 AM Last display: Screen Shot 2020-08-11 at 11 45 28 AM

Environment details Mac OS X 10.15.5

kitty 0.18.2 created by Kovid Goyal
Darwin pluto.local 19.5.0 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
ProductName:    Mac OS X ProductVersion:    10.15.5 BuildVersion:   19F101
Loaded config files: /Users/anicolao/.config/kitty/kitty.conf

Config options different from defaults:
active_border_color     Color(red=40, green=44, blue=52)
active_tab_background   Color(red=171, green=178, blue=191)
active_tab_font_style   (True, False)
active_tab_foreground   Color(red=40, green=44, blue=52)
allow_remote_control    y
background              Color(red=64, green=136, blue=192)
color0                  Color(red=40, green=44, blue=52)
color1                  Color(red=224, green=108, blue=117)
color10                 Color(red=152, green=195, blue=121)
color11                 Color(red=209, green=154, blue=102)
color12                 Color(red=97, green=175, blue=239)
color13                 Color(red=198, green=120, blue=221)
color14                 Color(red=86, green=182, blue=194)
color15                 Color(red=92, green=99, blue=112)
color2                  Color(red=152, green=195, blue=121)
color3                  Color(red=229, green=192, blue=123)
color4                  Color(red=97, green=175, blue=239)
color5                  Color(red=198, green=120, blue=221)
color6                  Color(red=86, green=182, blue=194)
color7                  Color(red=171, green=178, blue=191)
color8                  Color(red=89, green=98, blue=118)
color9                  Color(red=190, green=80, blue=70)
cursor                  Color(red=136, green=136, blue=136)
cursor_blink_interval   0.0
editor                  nvim
font_family             Roboto Mono
font_size               13.0
foreground              Color(red=51, green=51, blue=51)
inactive_border_color   Color(red=34, green=38, blue=45)
inactive_tab_background Color(red=34, green=38, blue=45)
inactive_tab_foreground Color(red=92, green=99, blue=112)
inactive_text_alpha     0.6
initial_window_height   (15, 'cells')
initial_window_width    (80, 'cells')
macos_titlebar_color    572927234
remember_window_size    False
tab_bar_margin_width    4.0
tab_fade                (1.0, 1.0, 1.0)
Added shortcuts:
     control+space KeyAction(func='send_text', args=['all', b'\x10'])
Changed shortcuts:
     shift+control+0 KeyAction(func='goto_layout', args=['stack'])
     shift+control+8 KeyAction(func='goto_layout', args=['fat'])
     shift+control+9 KeyAction(func='goto_layout', args=['tall'])
     shift+control+b KeyAction(func='scroll_page_up', args=())
     shift+control+f KeyAction(func='scroll_page_down', args=())
     shift+control+j KeyAction(func='previous_window', args=())
     shift+control+k KeyAction(func='next_window', args=())
     shift+control+t KeyAction(func='new_tab_with_cwd', args=())
     shift+control+left_bracket KeyAction(func='previous_tab', args=())
     shift+control+right_bracket KeyAction(func='next_tab', args=())
     shift+control+enter KeyAction(func='new_window_with_cwd', args=())
     shift+control+right KeyAction(func='move_tab_forward', args=())
     shift+control+left KeyAction(func='move_tab_backward', args=())
     shift+control+down KeyAction(func='move_window_backward', args=())
     shift+control+up KeyAction(func='move_window_forward', args=())

Additional context Hmm. I had trouble reproducing this with no config file, but I suspect it's the lifetime of the session that is the issue rather than the config file.

It is also possible that since I am speaking the remote control protocol to kitty directly, I could have a bug in my version of icat or I could have triggered a bug that is difficult to reach with icat.

anicolao commented 4 years ago

tiles bitmap

kovidgoyal commented 4 years ago

icat doesnt use remote control, and I cannot reproduce at all, with a blank config or with my regular config. SO some bisection of the config file to generate a minimal config to reproduce would help.

anicolao commented 4 years ago

Perhaps the problem only occurs via escape codes, which is almost exclusively how I display images in kitty. Wherever I have said "remote control" in reference to this issue I mean using escape codes to display images.

I will try to reproduce it that way, and see if that helps.

kovidgoyal commented 4 years ago

icat also uses escape codes to display images. Do you mean escape codes containing base64 data rather than streamed data?

anicolao commented 4 years ago

OK I can reproduce as follows. Untar the attached .tar.gz file. Start kitty --config NONE and start two terminals (two different os windows, though maybe two kitty windows would be as good). In each one, start the following loop:

for i in img*; do cat $i; echo $i; sleep 2; done

Images either appear blank; appear late; or appear corrupted at some point. Sometimes an old texture gets updated and looks fine in the end. Sometimes it just stays corrupt. Here is an example of a snoopy that fixed itself after a few more were displayed:

Screen Shot 2020-08-11 at 12 56 39 PM

kitty-escape-codes.tar.gz

anicolao commented 4 years ago

Re: icat vs these files; you said "icat doesn't use remote control" and I was speaking of remote control and escape codes as though they were the same thing, when I meant only escape codes. The files I have included in my reproduction case are generated by my own scripts, and so they could possibly be in violation of the graphics spec (I tried to make correct files, but you can look for yourself).

I would claim whether my files are valid or not, the observed behaviour is not OK. Mostly I am doing these operations over a very slow network and it reproduces more easily when I do that, but for the purpose of minimizing the variables the repro case shown above is running directly on my mac; but I could not get it to reproduce without running two windows, so I wonder if there's a thread safety issue in the processing of escape codes.

Commonly when I am working remotely the same escape codes could arrive at kitty via multiple os windows at almost the identical time. I also see a bug periodically where kitty spews escape codes into my terminal unexpectedly (usually just after starting vim) and this could be related.

kovidgoyal commented 4 years ago

All escape codes, from all windows, are parsed in a single thread, so it seems rather unlikely to be a thread safety issue. It could be a memory corruption issue, which you should be able to check easily using make asan and reproducing. I will look into your repro case when I have some time.

kovidgoyal commented 4 years ago

Doesn't repro for me with the following session file:

launch sh -c "cd /t/r; for i in img*; do cat $i; echo $i; sleep 2; done; read"
new_os_window
launch sh -c "cd /t/r; for i in img*; do cat $i; echo $i; sleep 2; done; read"

albeit on linux not macOS

kovidgoyal commented 4 years ago

Doesnt repro for me on macOS either

anicolao commented 4 years ago

I don't think it's likely to be a main memory corruption issue, though of course anything is possible.

More likely it's an unitialized GPU memory issue, for example a missing fence between the texture upload and the swap buffers. Since the upload is async, if the bits haven't arrived in the GPU's memory yet the display will show whatever was there before, which seems to match reasonably well with the random mishmash of other recognizable textures in the screenshot. If my hypothesis is correct, the problem will only be reproducible on discrete GPUs, and possibly also hard to reproduce without the same driver. Does your Mac have a discrete GPU? My GPU is an NVIDIA GeForce GTX 680MX 2 GB.

[edit to add]: this hypothesis is also consistent with the fact that later, the image is often "fixed".

anicolao commented 4 years ago

To my frustration, with the session file approach i can't reproduce the corruption either. It is the case that sometimes teh image doesn't draw for a long time, but the blank black space is much less objectionable than the corrupt image. I tried some variations of the script to no avail. My machine is in a similar state to what I was doing earlier so I am not sure what exactly to try to make it repro. I'll keep an eye out.

kovidgoyal commented 4 years ago

I dont see how. Image loading is not asynchronous. Image data is decoded and sent to GPU before kitty asks the GPU to render the image (image rendring nd text rendering happen in th same draw pass and all of it happens on a single thread). I am guessing this is some GPU driver/OpenGL driver bug. I dont use macs, but I do recall reading that Apple dropped support for NVIDIA cards because of all the bugs with them. Given I cannot reproduce there's not much I can do about this, but if you come up with better reproducers do let me know and I will investigate. Closing till then as so far I am not convinced this is a bug in kitty.

anicolao commented 4 years ago

GPU memory and main memory are distinct on any computer with a discrete GPU (as opposed to an emulated GPU on the CPU which is also commonplace) and the asynchrony is meant to be under the OpenGL programmer's control. Without knowing where kitty actually does the rendering and whether or not fences or flushes are used to keep it in sync, I still think this is the most likely source of the issue. https://www.khronos.org/opengl/wiki/Synchronization

kovidgoyal commented 4 years ago

I disagree, but feel free to prove me wrong, relevant code is in shaders.c and graphics.c

anicolao commented 4 years ago

I have left screenshots as the proof that there is a bug; proving that it is in kitty as opposed to in my Mac's GPU drivers does not seem worth the effort unless it affects me more often. Since I mainly plan to display one image at a time and then clear it, it's unlikely to matter to me personally.

I will say that the odds of the GPU driver being responsible for showing me a corrupt texture and then suddenly rendering it properly later are very low indeed. If I had such a GPU bug, it would appear in Chrome as it uses OpenGL for rendering.

[edited for certainty]

kovidgoyal commented 4 years ago

To the contrary, GPU drivers have bugs like this ALL the time. And they have literally thousands of independent rendering paths that you haven't see a bug in Chrome is proof of nothing.

Luflosi commented 4 years ago

Here is an example of a GPU driver bug that only affected kitty: https://bugs.freedesktop.org/show_bug.cgi?id=110201. At least I don't think there were any reports of it affecting any other software.

anicolao commented 4 years ago

I took a minute to find the function that's drawing the images in my case and I believe it is send_image_to_gpu which uses glTexImage2D which is supposed to hide the asynchrony from the application developer so if it is a texture upload issue then indeed it is a driver bug. And I still think texture upload is the most likely source of the behaviour I'm observing as nothing else cleanly accounts for the style of artifacts, or at least I think it's unlikely that there's a section of main memory that looks like the screenshots.