Closed anicolao closed 4 years ago
icat doesnt use remote control, and I cannot reproduce at all, with a blank config or with my regular config. SO some bisection of the config file to generate a minimal config to reproduce would help.
Perhaps the problem only occurs via escape codes, which is almost exclusively how I display images in kitty. Wherever I have said "remote control" in reference to this issue I mean using escape codes to display images.
I will try to reproduce it that way, and see if that helps.
icat also uses escape codes to display images. Do you mean escape codes containing base64 data rather than streamed data?
OK I can reproduce as follows. Untar the attached .tar.gz file. Start kitty --config NONE
and start two terminals (two different os windows, though maybe two kitty windows would be as good). In each one, start the following loop:
for i in img*; do cat $i; echo $i; sleep 2; done
Images either appear blank; appear late; or appear corrupted at some point. Sometimes an old texture gets updated and looks fine in the end. Sometimes it just stays corrupt. Here is an example of a snoopy that fixed itself after a few more were displayed:
Re: icat vs these files; you said "icat doesn't use remote control" and I was speaking of remote control and escape codes as though they were the same thing, when I meant only escape codes. The files I have included in my reproduction case are generated by my own scripts, and so they could possibly be in violation of the graphics spec (I tried to make correct files, but you can look for yourself).
I would claim whether my files are valid or not, the observed behaviour is not OK. Mostly I am doing these operations over a very slow network and it reproduces more easily when I do that, but for the purpose of minimizing the variables the repro case shown above is running directly on my mac; but I could not get it to reproduce without running two windows, so I wonder if there's a thread safety issue in the processing of escape codes.
Commonly when I am working remotely the same escape codes could arrive at kitty via multiple os windows at almost the identical time. I also see a bug periodically where kitty spews escape codes into my terminal unexpectedly (usually just after starting vim
) and this could be related.
All escape codes, from all windows, are parsed in a single thread, so it seems rather unlikely to be a thread safety issue. It could be a memory corruption issue, which you should be able to check easily using make asan and reproducing. I will look into your repro case when I have some time.
Doesn't repro for me with the following session file:
launch sh -c "cd /t/r; for i in img*; do cat $i; echo $i; sleep 2; done; read"
new_os_window
launch sh -c "cd /t/r; for i in img*; do cat $i; echo $i; sleep 2; done; read"
albeit on linux not macOS
Doesnt repro for me on macOS either
I don't think it's likely to be a main memory corruption issue, though of course anything is possible.
More likely it's an unitialized GPU memory issue, for example a missing fence between the texture upload and the swap buffers. Since the upload is async, if the bits haven't arrived in the GPU's memory yet the display will show whatever was there before, which seems to match reasonably well with the random mishmash of other recognizable textures in the screenshot. If my hypothesis is correct, the problem will only be reproducible on discrete GPUs, and possibly also hard to reproduce without the same driver. Does your Mac have a discrete GPU? My GPU is an NVIDIA GeForce GTX 680MX 2 GB
.
[edit to add]: this hypothesis is also consistent with the fact that later, the image is often "fixed".
To my frustration, with the session file approach i can't reproduce the corruption either. It is the case that sometimes teh image doesn't draw for a long time, but the blank black space is much less objectionable than the corrupt image. I tried some variations of the script to no avail. My machine is in a similar state to what I was doing earlier so I am not sure what exactly to try to make it repro. I'll keep an eye out.
I dont see how. Image loading is not asynchronous. Image data is decoded and sent to GPU before kitty asks the GPU to render the image (image rendring nd text rendering happen in th same draw pass and all of it happens on a single thread). I am guessing this is some GPU driver/OpenGL driver bug. I dont use macs, but I do recall reading that Apple dropped support for NVIDIA cards because of all the bugs with them. Given I cannot reproduce there's not much I can do about this, but if you come up with better reproducers do let me know and I will investigate. Closing till then as so far I am not convinced this is a bug in kitty.
GPU memory and main memory are distinct on any computer with a discrete GPU (as opposed to an emulated GPU on the CPU which is also commonplace) and the asynchrony is meant to be under the OpenGL programmer's control. Without knowing where kitty actually does the rendering and whether or not fences or flushes are used to keep it in sync, I still think this is the most likely source of the issue. https://www.khronos.org/opengl/wiki/Synchronization
I disagree, but feel free to prove me wrong, relevant code is in shaders.c and graphics.c
I have left screenshots as the proof that there is a bug; proving that it is in kitty as opposed to in my Mac's GPU drivers does not seem worth the effort unless it affects me more often. Since I mainly plan to display one image at a time and then clear it, it's unlikely to matter to me personally.
I will say that the odds of the GPU driver being responsible for showing me a corrupt texture and then suddenly rendering it properly later are very low indeed. If I had such a GPU bug, it would appear in Chrome as it uses OpenGL for rendering.
[edited for certainty]
To the contrary, GPU drivers have bugs like this ALL the time. And they have literally thousands of independent rendering paths that you haven't see a bug in Chrome is proof of nothing.
Here is an example of a GPU driver bug that only affected kitty: https://bugs.freedesktop.org/show_bug.cgi?id=110201. At least I don't think there were any reports of it affecting any other software.
I took a minute to find the function that's drawing the images in my case and I believe it is send_image_to_gpu
which uses glTexImage2D
which is supposed to hide the asynchrony from the application developer so if it is a texture upload issue then indeed it is a driver bug. And I still think texture upload is the most likely source of the behaviour I'm observing as nothing else cleanly accounts for the style of artifacts, or at least I think it's unlikely that there's a section of main memory that looks like the screenshots.
Describe the bug Sometimes, after displaying large PNGs, bad data are displayed.
To Reproduce
Expected behavior The first and last display of tiles should look the same.
Screenshots First display: Last display:
Environment details Mac OS X 10.15.5
Additional context Hmm. I had trouble reproducing this with no config file, but I suspect it's the lifetime of the session that is the issue rather than the config file.
It is also possible that since I am speaking the remote control protocol to kitty directly, I could have a bug in my version of icat or I could have triggered a bug that is difficult to reach with icat.