hpjansson / chafa

πŸ“ΊπŸ—Ώ Terminal graphics for the 21st century.
https://hpjansson.org/chafa/
GNU Lesser General Public License v3.0
2.84k stars 60 forks source link

kitty mode doesn't render gifs properly #104

Closed acxz closed 2 years ago

acxz commented 2 years ago

See the following gif file: t

See how chafa renders it in kitty:

https://user-images.githubusercontent.com/17132214/185245999-b2b526f3-ecd7-429f-93f9-6f51eba66fb5.mp4

Looks like there needs to be some code to clear the output every gif frame? This issue doesn't appear in timg, maybe looking at what they do can give us the solution to fix it.

Just putting some helpful links here, for future ref: https://github.com/hzeller/timg/blob/262acd1a417bed4ffda324ca8dc0db764ed03a3a/src/kitty-canvas.cc https://github.com/hpjansson/chafa/blob/92e766ae1fe212c0512c25cb3ed54a4a7d2502fc/chafa/chafa-term-db.c https://sw.kovidgoyal.net/kitty/graphics-protocol/

hpjansson commented 2 years ago

Verified. The issue is that with sixels and I think iTerm2, you have "destructive transparency" that clears whatever's underneath it. Or rather, it's a SRC operation, while Kitty uses OVER.

It could be solved by printing spaces over the old frame, but that could make the animation flicker. I'll have a look.

acxz commented 2 years ago

@hpjansson can you give some tips on how to resolve this? I've been staring at https://github.com/hpjansson/chafa/blob/92e766ae1fe212c0512c25cb3ed54a4a7d2502fc/chafa/chafa-term-db.c#L185-L193

and trying to read the kitty docs (https://sw.kovidgoyal.net/kitty/graphics-protocol/) to try to resolve this, but its honestly going over my head.

hpjansson commented 2 years ago

The strings in chafa-term-db.c have positional arguments (%1, %2, %3), and different terminals can have different definitions of the same sequence. There is a uniform API on top of it, implemented by chafa-term-info.c. Most of those calls are generated from macro invocations you'll find in chafa-term-seq-def.h. So e.g. the API for CHAFA_TERM_SEQ_BEGIN_KITTY_IMMEDIATE_IMAGE_V1 is defined here:

https://github.com/hpjansson/chafa/blob/8b3426e6bb20b3ead959b4e8e3f88d713bbebd17/chafa/chafa-term-seq-def.h#L886

When the macro is invoked in chafa-term-info.c by #including the header, it defines the API entry point chafa_term_info_emit_begin_kitty_immediate_image_v1() with the corresponding arguments.

This is somewhat complex, but it prevents repeating big code blobs over and over and allows us to keep the header definition, enum mapping, documentation and API implementation together in one place.

I'll have a bit of time later tonight to look at how to solve this -- looking quickly at timg it doesn't seem to be doing anything special, but it has an option for "local transparency handling" and sending images over without an alpha channel. If the images are indeed being sent without alpha, that works around the issue. However, this won't look as nice if your terminal is transparent (showing the desktop or a background image instead of solid background color).

Did you test timg with a transparent terminal, or one with an image background?

hpjansson commented 2 years ago

timg-chafa-anim_alpha

sending images over without an alpha channel

Confirmed that's what timg is doing (upper). Chafa respects alpha (lower).

Since I don't think kitty + background image + transparent animation is a very common combo, we could just do the same and flatten alpha in this case. I don't see a better way of doing it short of implementing kitty's animation protocol, but then we'd have to preserve state between frames, and that's a whole other ballgame.

hpjansson commented 2 years ago

Done. It's automatic, but respects --bg and --invert, so you can get it to composite on white or any other color instead of the default black. You can also force enable it with -t 1 or disable with -t 0 or any other value.

See commit 6fa53ad09a94fb7aaa5d75d02c77135fe06bf53e for the library bits.

acxz commented 2 years ago

thanks for taking a look at this, @hpjansson, and a fast implementation!

Did you test timg with a transparent terminal, or one with an image background?

Yep timg with a transparent terminal works properly works for gifs. I tested on timg version 1.4.4. Ah I tested this a bit more and it is a bit more nuanced. I am using the opacity from my compositor and not from kitty. And it seems that timg automatically determines the background color of my terminal and uses that. Since the transparency occurs at the compositor level, it stays consistent. By manually setting the opacity of kitty, I reproduce your above gif with timg.

Below video is using transparency of picom (compositor) and not of kitty.

https://user-images.githubusercontent.com/17132214/190944025-13655510-c734-431a-88b2-e021b79a785b.mp4

Manually using the --bg argument with chafa for my current kitty background reproduces the timg behavior in chafa.

Should I open a new issue for auto determine background color for transparent gifs in kitty? For reference, here is how timg determines the term background color: https://github.com/hzeller/timg/blob/9bae51e50094b67318b9eda40f49341d596ebaa3/src/term-query.cc#L118-L178

Note: I haven't looked at if we have similar functionality, maybe it is simpler than what I am thinking it out to be.

hpjansson commented 2 years ago

Querying the background color isn't hard per se, but there are general issues with querying the terminal that I'd like to avoid for now, e.g. the terminal might not support the query (in which case it could get dumped to the screen instead -- I've seen this happen), the wait/round-trip for response might be long (e.g. running in tmux behind a jump host on a slow VPN in a faraway country -- not that uncommon) and it could compound (a script doing for i in files; do chafa "$i"; done would suffer the delay repeatedly). There may not always be a way to discover/talk to the tty, especially if the output is being redirected.

So at a minimum it'd have to be hidden behind a switch, or tuned cleverly to only trigger automatically when it's very certain to work and latency is low. It's a useful feature in an interactive context, though, so feel free to open an issue for it. We could turn it into a research ticket for a future interactive mode.

Two questions:

acxz commented 2 years ago

It's a useful feature in an interactive context, though, so feel free to open an issue for it.

Got it.

do you depend on us detecting the BG color

For pokeshell, we do depend on the display library to display the appropriate background color. Detecting the background color in bash might be cumbersome and I would have to add another imagemagick operation to convert the transparent image to an image with the appropriate background.

For a client library like pokeshell that uses chafa, I think what you suggested of having it behind a switch (maybe such like --bg auto) would be great.

implementation fast enough?

Yeah, it is on par with timg's speed just from a visual inspection.

AnonymouX47 commented 1 year ago

There's actually a workaround I use for this (on Kitty only, not necessary on Konsole). Emitting a sequence with a=d,d=C just before drawing the next frame produces this:

On Kitty:

https://user-images.githubusercontent.com/61663146/221445673-7e3df5cf-8932-4d08-a2c2-9ad88237349e.mp4

On Konsole:

https://user-images.githubusercontent.com/61663146/221445697-4024592f-d9e6-461d-93b5-dc42cb21f9ff.mp4


But animations might flicker briefly for large or dense (well distributed colors) images, depending on how fast the terminal emulator draws images.

Also, d=C (amongst others) is faulty on Kitty < 0.25.1 (See https://github.com/kovidgoyal/kitty/issues/5081). I use a specific z-index for all frames and a=d,d=z.

AnonymouX47 commented 1 year ago

Also, I should note that the approach you use has a flaw. The problem is, as you draw images on top of one another without deleting the previous one, over time the terminal emulator actually slows down in perfermance.

This is much more noticeable in full screen TUIs where the screen isn't scrolled.

Kitty does optimize things a little though by deleting previously drawn images probably once a memory limit is reached but at that point performance would've gotten really bad.

On Konsole, an image overwrites an existing one if and only if placed at the exact same position. Though, I'm not sure that was even intentional. This makes deleting the previous frame unnecessary and Konsole is actually better off that way since it draws images quite slower.

I reported this behaviour at https://github.com/kovidgoyal/kitty/issues/5187 but the author prefers to leave it unspecified.

All in all... The actual solution to all these is to use the native animation of the protocol. I intend to implement support for this in my project but haven't gotten to it since Kitty is the only terminal emulator that currently supports it and that doesn't give so much motivation.

hpjansson commented 1 year ago

I think I'm using the protocol correctly. Memory management is the terminal's responsibility -- e.g. if you cat an image to the terminal, then the client will exit before returning control to the user (usually back to the shell). There's no one on the shell's end to keep track of state then.

I wrote a sixel patch for VTE, and what I did there was to always erase overwritten contents (no blend). Then when an image is completely overlapped, free it. Additionally there is a cache threshold like you mention. On top of that, the VTE maintainer would like images to get swapped to disk when scrolled out -- I haven't finished that bit yet.

Anyway, this isn't all that hard to solve in the terminal, and it's easier and less total work than having each client do its own remote memory management.

Kitty has a more advanced animation protocol that lets you buffer multiple frames etc. That's nice (and something we may want to support eventually), but it means the client has to keep track of state. And it can still get interrupted by a signal and leave the terminal hanging.

hpjansson commented 1 year ago

Also, if this makes the terminal emulator bloat/slow down, that sounds like a bug... Even if blending, it should keep track of the painted area and blend into the existing pixmap(s) whose total extents are limited by the size of the terminal window (in the non-scrolling case).

Edit: I'll have to try your workaround. Looks interesting. Hoping to avoid state tracking/z-index, though -- and it needs to not flicker over a slow link.

AnonymouX47 commented 1 year ago

Anyway, this isn't all that hard to solve in the terminal, and it's easier and less total work than having each client do its own remote memory management.

True.

but it means the client has to keep track of state.

I'm not sure I understand you here. The protocol supports sending all frames (and duration per frame, I believe) and allowing the terminal to control looping, that's what the icat kitten does.

And it can still get interrupted by a signal and leave the terminal hanging.

That would be the fault of the whoever is responsible for the signal. Anyways, along as the signal can be handled, writing ST (String Terminator) a couple times before the process finally exits is a way out (tested).

it needs to not flicker over a slow link.

Never tested that... I hope it works out. You could make the workaround optional (and probably add a warning about possible flickering).

hpjansson commented 1 year ago

Apologies for taking so long to get back to this. A lot of stuff happened here, and it fell by the wayside.

but it means the client has to keep track of state.

I'm not sure I understand you here. The protocol supports sending all frames (and duration per frame, I believe) and allowing the terminal to control looping, that's what the icat kitten does.

A couple of things:

And it can still get interrupted by a signal and leave the terminal hanging.

That would be the fault of the whoever is responsible for the signal. Anyways, along as the signal can be handled, writing ST (String Terminator) a couple times before the process finally exits is a way out (tested).

Sure, as long as the write() terminates successfully before you get a second, unhandled SIGTERM or a SIGKILL. Anyway, this isn't such a huge problem, I was more thinking that Kitty's native protocol is not a panacea and also creates new complexities.

it needs to not flicker over a slow link.

Never tested that... I hope it works out. You could make the workaround optional (and probably add a warning about possible flickering).

I'll see what can be done. Since every terminal seems to do this differently now, the only safe way is to disallow transparent animations by default, compositing them on a solid color. Possibly with an override option for when you're Really Sure (tm).

kovidgoyal commented 1 year ago

Just FYI, you dont need an image id, you can use an image number. They dont need to be unique. Animation support has existed for years, you dont need to query for it, anymore than you query for basic support/available transmission channels, and all those queries can be batched in one pass. There is no need to stop animations on application exit, but if you do want to do that, "waiting for the buffer to flush" will not be any appreciable amount of time. IMO, of your list of concerns, the only real one would be differring implementations.

AnonymouX47 commented 1 year ago
  • Long (or especially wide/tall) animations will hit memory limits in terminals.

I recently tested kitty's icat with a 2188-frame 1354x768 GIF and memory usage wasn't near "good" i.e in comparison to memory usage when drawing frame-by-frame. It used over 4 GiBs of memory (while decoding frames I assume)... but that was icat, not the terminal emulator. kitty itself only used up just about an extra 10s of MiB (though didn't seem to handle the animation well, kept getting stuck at particular frame and then restarts just to get stuck again).

So, I think memory usage on the terminal emulator's end is rather OK.

  • Adding frames to an animation requires multiple operations referring to the image ID. You have to keep track of this.
  • Image IDs can conflict with other applications, or they with yours. To avoid this, you need to request the image ID from the terminal and wait for the response, managing the bidirectional communications state. You will need a parser too, with a context (state) that can be fed incrementally.

True... and the query time accumulates as the program displays more animations, except all the image ID queries are made in one pass which is only workable if the number of animations is known ahead.

From my little experience, working with image IDs is the most painful aspect of the protocol... that's why I still avoid it for now but unfortunately certain features of the protocol are tied to it.

  • This means writing the control sequence and waiting for your buffer to flush, which can take an arbitrary amount of time after the user e.g. hits ctrl-c.

This would only be a problem if all frames are rendered and transmitted at once. I wouldn't advice that for multiple reasons, the major ones being memory usage and delay in animation start. I believe this is what icat does and it doesn't seem to work too well, though it's not really noticeable for animations with just a couple (< a few 100s) frames though.

  • Different protocol extensions are implemented in different versions, so you have to query the terminal as the first step.

From the specification of the protocol, the only thing that has changed as regards the animation aspect since it's addition in Kitty v0.20.0 is "Support for frame composition" which was added in v0.22.0 and that was quite a long time ago (by the way, it would've been better if the protocol's spec had it's own separate versioning and a way to query it).

  • Since it's a complex protocol, I doubt we can expect different terminals to implement it exactly the same.

100% true, this is inevitable. Implementations of even less complex (including fundamental) aspects of the protocol are already inconsistent on some terminal emulators.

I was more thinking that Kitty's native protocol is not a panacea and also creates new complexities.

Hmm... that's true.

Possibly with an override option for when you're Really Sure (tm).

Yeah, I think making it optional is the best option for now.


EDIT:

AnonymouX47 commented 1 year ago

Just FYI, you dont need an image id, you can use an image number.

To transmit multiple frames of an animation? πŸ€” How would that be since image numbers are not unique?

The spec states:

All future commands that refer to images using the image number, such as creating placements or deleting images, will act on only the newest image with that number.

How can I be sure another program won't use the same image number while transmitting the animation frames?

kovidgoyal commented 1 year ago

On Mon, May 01, 2023 at 12:55:14AM -0700, Toluwaleke Ogundipe wrote:

Just FYI, you dont need an image id, you can use an image number.

To transmit multiple frames of an animation? πŸ€” How would that be since image numbers are not unique?

An image number will refer to the last created image with that number, it doesnt matter how many such images there are.

The spec states:

All future commands that refer to images using the image number, such as creating placements or deleting images, will act on only the newest image with that number.

How can I be sure another program won't use the same image number while transmitting the animation frames?

If you have multiple independent programs writing to the tty you are hosed in any case, regardless of this protocol. How do you know the other program wont clear the screen or move the cursor or change colors or delete things or overwrite things.

AnonymouX47 commented 1 year ago

If you have multiple independent programs writing to the tty you are hosed in any case, regardless of this protocol. How do you know the other program wont clear the screen or move the cursor or change colors or delete things or overwrite things.

Well... not in that sense. I guess I used the wrong term. The answer is pretty straight-forward for an application program which will most likely be the only one writing to the screen at a time, except in a multiplexer, which probably shouldn't be using real (i.e non-virtual) placements (which animations are) anyways.

I'm more concerned with a single application program utilizing multiple libraries that independently utilize the protocol e.g in a TUI. Writing to the TTY will be easily coordinated but doesn't stop an image rendered by one library to be written to the TTY while frames of an animation rendered by the other library are still being written to the TTY. Even worse is the case of animation frames rendered by both libraries being written to the TTY concurrently.

kovidgoyal commented 1 year ago

On Mon, May 01, 2023 at 10:14:08PM -0700, Toluwaleke Ogundipe wrote:

If you have multiple independent programs writing to the tty you are hosed in any case, regardless of this protocol. How do you know the other program wont clear the screen or move the cursor or change colors or delete things or overwrite things.

Well... not in that sense. I guess I used the wrong term. The answer is pretty straight-forward for an application program which will most likely be the only one writing to the screen at a time, except in a multiplexer, which probably shouldn't be using real (i.e non-virtual) placements (which animations are) anyways.

Animations can be virtual placements as well, IIRC.

I'm more concerned with a single application program utilizing multiple libraries that independently utilize the protocol e.g in a TUI. Writing to the TTY will be easily coordinated but doesn't stop an image rendered by one library to be written to the TTY while frames of an animation rendered by the other library are still being written to the TTY.

That would apply to any UI. If you use multiple independent libraries to write to the screen in a GUI program you would have the same issue. Fundamentally, the screen is a shared global resource. It is up to applications to manage that resource responsibly. No protocol can protect a shared resource from being corrupted by multiple independent consumers.

AnonymouX47 commented 1 year ago

Animations can be virtual placements as well, IIRC.

Oh, Wow! I'll have to try that out then.

Fundamentally, the screen is a shared global resource. It is up to applications to manage that resource responsibly.

Very true but my point is... to achieve that (responsible management) with the kitty graphics protocol, both libraries cannot use image numbers to transmit animation frames.

kovidgoyal commented 1 year ago

On Mon, May 01, 2023 at 11:34:07PM -0700, AnonymouX47 wrote:

Very true but my point is... to achieve that (responsible management) with the kitty graphics protocol, both libraries cannot use image numbers to transmit animation frames.

My point is it is impossible for two independent libraries to draw anything to the screen safely unless the application that is using those libraries uses them responsibly. In a GUI toolkit that would mean the application will divide up screena rea and input events between the libraries, for the graphics procolo, it means the application has to assign ids to images and pass those ids to the libraries.

AnonymouX47 commented 1 year ago

it means the application has to assign ids to images and pass those ids to the libraries.

I guess I'll work with that, along with an interface to request an ID from the terminal (which will be unavoidably required when working with terminal multiplexers). This was exactly what I kept coming back to anytime I though about how to implement the use of image IDs but I was always considering the possibility of further abstraction.

Thanks.

hpjansson commented 1 year ago

Just FYI, you dont need an image id, you can use an image number. They dont need to be unique.

Oh, an image number looks like exactly what I need, since the context doesn't have to last past the lifetime of a Chafa invocation (the most complex command-line use case is when it's invoked as a preview generator in file browsers, and those have no choice but to push the output as one big blob -- so no command interleaving). Great!

Animation support has existed for years, you dont need to query for it, anymore than you query for basic support/available transmission channels, and all those queries can be batched in one pass.

Well, an API specification/reference implementation having had a feature for years (in this case, since 2021?) isn't a guarantee that a different implementation will have it, or that the user isn't running an older release. There are plenty of terminal standards that're decades old but rarely implemented, and plenty of users on years-old LTS distros that only ship security updates, or even newer distros where the packages just aren't up to date. I still get contacted by people running some repackaging of the Chafa master branch at v1.7 on like, Raspbian or whatever.

Since I don't really want to query for everything, I try to keep things as simple as possible instead. And I agree that best-effort, maybe with some peeking at env vars and such, is probably good enough here.

There is no need to stop animations on application exit, but if you do want to do that, "waiting for the buffer to flush" will not be any appreciable amount of time. IMO, of your list of concerns, the only real one would be differring implementations.

Yeah... It comes down to what users expect. I can imagine someone filing a bug report along the lines of "I switched from mlterm to Kitty and now animations won't stop! This bouncing pokemon is driving me insane!" or vice versa (though I doubt anyone will want to switch away from Kitty once they've tried it, obviously).

One thing I couldn't find in the documentation is how to manage very long animations (aka video streams). Ideally it would be possible to use the animation sequence as a FIFO queue, discarding the oldest frame as new ones are transferred -- but I couldn't find a way to delete animation frames. It looks like it might be possible with the d key, but it's a little unclear to me.

(By the way, buffer flush can take a long time even for small payloads. Ask me about living out in the sticks in Mexico and having to use GUI apps with blocking network writes in the main loop, written by urban US/Euro dwellers with perfect connectivity sometime -- but you're buying).

kovidgoyal commented 1 year ago

On Tue, May 02, 2023 at 03:39:05PM -0700, Hans Petter Jansson wrote:

One thing I couldn't find in the documentations is how to manage very long animations (aka video streams). Ideally it would be possible to use the animation sequence as a FIFO queue, discarding the oldest frame as new ones are transferred -- but I couldn't find a way to delete animation frames. It looks like it might be possible with the d key, but it's a little unclear to me.

The animation protocol isn't really designed for video. For proper video support one needs to consider audio synchronization as well which was not a can of worms I wanted to open, given these protocols have to work over variable and unknown latency links.

That said, the animation protocol has two modes of operation, one where you send the frames and timing info to the terminal and it takes care of changing frames for you, the other is where you send escape codes to the terminal telling it to change a frame when you want it to. This latter mode should allow for send and delete semantics, though I confess I havent tested it myself.

hpjansson commented 1 year ago

The animation protocol isn't really designed for video. For proper video support one needs to consider audio synchronization as well which was not a can of worms I wanted to open, given these protocols have to work over variable and unknown latency links.

I think that's wise. In any case, if multimedia were ever to become an option, it'd make sense to delegate the synchronization etc. to embedded codecs running in sandboxes, much like web browsers do it. And then implement a muxing packet protocol (superset of how Kitty does image chunking now) with MIME types etc. But heh, yeah, maybe best left unopened indeed. However:

That said, the animation protocol has two modes of operation, one where you send the frames and timing info to the terminal and it takes care of changing frames for you, the other is where you send escape codes to the terminal telling it to change a frame when you want it to. This latter mode should allow for send and delete semantics, though I confess I havent tested it myself.

What I'd like isn't really to watch movies with sound and so on, but the ability to efficiently display animation with indefinite duration. Two use cases I can think of are:

The ideal interface for this is something that buffers a single frame on the terminal end and performs a swap and delete when the next frame arrives.

kovidgoyal commented 1 year ago

On Fri, May 05, 2023 at 03:48:25AM -0700, Hans Petter Jansson wrote:

The ideal interface for this is something that buffers a single frame on the terminal end and performs a swap and delete when the next frame arrives.

This should actually be possible already. I am travelling for a while so dont have the bandwidth to check this in detail, but you simply use the second mode I talked about. Transmit one frame, transmit the escape code to display it. Transmit the second frame, wait for acknowledgement that it is recived, then when the time is right, transmit the escape code to switch to it and finally transmit the escape code to delete the first frame. This should work as intended, with the caveat that it might be hard to get the timing right over a slow/high latency link.

kovidgoyal commented 1 year ago

See https://github.com/chase/awrit for an example rendering chromium in the terminal using the kitty graphics protocol.

AnonymouX47 commented 1 year ago

From what I can see,

https://github.com/chase/awrit/blob/018251d8aed5024b06e63fa01fe035975a6df9cb/awrit/tui.cc#L119

it doesn't use the animation feature of the protocol. It's approach is pretty much the same as chafa's... except I missed something.

kovidgoyal commented 1 year ago

I didnt say it used the animation protocol, just that it is of the category of showing a UI by rendering pixels to the screen using the protocol. If you want an example of a library that uses the protocol, IIRC notcurses does and it achieved massive performance gains from doing so. https://github.com/dankamongmen/notcurses/discussions/1898

AnonymouX47 commented 1 year ago

I didnt say it used the animation protocol, just that it is of the category of showing a UI by rendering pixels to the screen using the protocol.

Oh, ok.

dankamongmen/notcurses#1898

Sweet resource. Thanks.

dankamongmen commented 1 year ago

yep, can vouch for the power of kitty graphics!

hpjansson commented 1 year ago

Circling back to this: Since Kitty is doing compositing, it would be nice if it supported simple src ops in addition to the current src-over. Then there'd be no need to use the animation protocol for client-controlled animations with transparency -- you'd just repeatedly push images to the same position using the src op.

Kitty could detect when an image can no longer be made visible (e.g. covered by src placement(s) and the client no longer has a valid handle for it and the covering placements). It could then free the covered-up image. If I'm not mistaken, if you created a stack of three src images reusing an image number, the lowest-placed image could be retired (no way to bring it to front, image above it likewise can't be sent to back).

hpjansson commented 1 year ago

@kovidgoyal Some supplementary info for convenience: SVG comp-op and the Pixman ops. Sixels support something similar to src/src-over using its P2 argument; my interpretation of P2=0 ("transparent pixels are drawn in the BG color") is that it's akin to a src op, while P2=1 leaves transparent pixels alone, similar to src-over.

I doubt there's a use for any of the other typical compositing operations, except for src-atop, which could perhaps be used to maintain a soft-bordered clipping area where images could be moved around without reuploading anything.

kovidgoyal commented 1 year ago

I'm afraid I dont see a compelling using case for more compositing operations. The animation protocol already addresses this use case more than adequately. It even allows one to optimize bandwidth by transmitting only deltas for successive frames. Adding more composition operators to allow a worse implementation of animation is not something I am interested in.

dankamongmen commented 1 year ago

Circling back to this: Since Kitty is doing compositing, it would be nice if it supported simple src ops in addition to the current src-over. Then there'd be no need to use the animation protocol for client-controlled animations with transparency -- you'd just repeatedly push images to the same position using the src op.

Kitty could detect when an image can no longer be made visible (e.g. covered by src placement(s) and the client no longer has a valid handle for it and the covering placements). It could then free the covered-up image. If I'm not mistaken, if you created a stack of three src images reusing an image number, the lowest-placed image could be retired (no way to bring it to front, image above it likewise can't be sent to back).

i'm pretty sure this can already be done, unless i'm misunderstanding you. take a look at https://github.com/kovidgoyal/kitty/issues/3809 and https://github.com/dankamongmen/notcurses/blob/master/src/lib/kitty.c.

AnonymouX47 commented 1 year ago

@dankamongmen Yes, the src op is already supported but only via the animation protocol.

If I understand @hpjansson correctly, the suggestion is to support the op for non-animation transmissions:

Then there'd be no need to use the animation protocol for client-controlled animations with transparency -- you'd just repeatedly push images to the same position using the src op.

Anyways... as much as this doesn't seem so difficult to implement in any terminal emulator that currently implements composition of overlapping images according to the spec (Kitty and Konsole I know of), I understand @kovidgoyal's point:

Adding more composition operators to allow a worse implementation of animation is not something I am interested in

Even though it sounds painful, I can't disagree with the fact that it's valid.


I also understand @hpjansson point of view. From what I know of Chafa, the library (i.e not the CLI) currently has no idea of non-/animated images, it simply takes in image (single frame) pixel data and produces text (possibly including control sequences) corresponding to the specified output format (kitty, sixels, etc). Having to use the animation protocol will result in creating a special case (possibly a separate/extended interface) for the kitty format and I guess that wouldn't fit well with the long-existing and current model of its API.

Adding/reusing a graphics command parameter to specify the composition operation wouldn't be bad IMO. Instead, it would only allow for more use cases and flexibility. I don't think the src op will be any less performant than the current single composition operation (I suppose src-over). In fact, I think the performance should be the other way round.

Just by the way... this reminds me of https://github.com/kovidgoyal/kitty/issues/5187 (which is quite related to the topic at hand) where I mentioned that Konsole already partially implemented this behaviour (the src op).

hpjansson commented 1 year ago

@kovidgoyal Let me rephrase it. If I'm developing a tool that converts images to kitty format, and it's being called via fork/exec from a file browser with a preview pane, which takes the output from this tool, places the cursor at the top left of the preview area, and prints it -- what is the correct way, in kitty, to avoid blending it with whatever might be left over from the previous preview?

kovidgoyal commented 1 year ago

On Wed, Jun 07, 2023 at 06:34:36AM -0700, Hans Petter Jansson wrote:

@kovidgoyal Let me rephrase it. If I'm developing a tool that converts images to kitty format, and it's being called via fork/exec from a file browser with a preview pane, which takes the output from this tool, places the cursor at the top left of the preview area, and prints it -- what is the correct way, in kitty, to avoid blending it with whatever might be left over from the previous preview?

The file browser needs to delete all images in the preview area before displaying the image. Indeed, several file browsers work just fine with kitty's own icat kitten to do that. Practically speaking file browsers generally redraw the screen anyway for each new file. So the clearing is automatic. Several more have implemented the protocol directly, which gives them much more control.

If the image is an animation, the tool would output all the frames at once, just as icat does.

hpjansson commented 1 year ago

The file browser needs to delete all images in the preview area before displaying the image.

I see. What's best practice, in the general case, for clearing the image presentation area before showing an image, if you don't have the handles of previous images? The idea is to get as close as possible to a simple copy/src operation like you'd have with symbol mosaics, sixels or iterm.

I'm asking because I'm looking for the best way to abstract away this difference in public API where the norm is "new content overwrites old content", i.e. if you print spaces on top of text, old characters will not show through. Same with sixels when P2=0.

kovidgoyal commented 1 year ago

On Wed, Jun 07, 2023 at 07:11:33AM -0700, Hans Petter Jansson wrote:

The file browser needs to delete all images in the preview area before displaying the image.

I see. What's best practice, in the general case, for clearing the image presentation area before showing an image, if you don't have the handles of previous images? The idea is to get as close as possible to a simple copy/src operation like you'd have with symbol mosaics, sixels or iterm.

You can choose any of the general erase methods that clear an area: https://sw.kovidgoyal.net/kitty/graphics-protocol/#deleting-images

For example a to delete everything on screen, or c to delete anything that intersects the current cursor position or p for a particular cell or x/y for particular row/column.

I'm asking because I'm looking for the best way to abstract away this difference in public API where the norm is "new content overwrites old content", i.e. if you print spaces on top of text, old characters will not show through. Same with sixels when P2=0.

I am actually ok with adding another delete mode that deletes anything in a specified rectangle as well. As a convenient shorthand for doing all the p based individual cell deletes manually.

hpjansson commented 1 year ago

I am actually ok with adding another delete mode that deletes anything in a specified rectangle as well. As a convenient shorthand for doing all the p based individual cell deletes manually.

Won't this delete entire placements even if they only intersect slightly? Also, except for the d=c option, which works on a single cell (?), the offsets seem to be spec'd as absolute, but for this to work, they'd have to be relative.

Please note I'm not trying to make your life hard, I'd just like to cover the corner cases without creating new ones. I thought placements having a src op would be the easiest way to do it -- it'd allow the user to set all four channels at the destination (including "punching holes"), and basically every image library supports it as a simple copy op. But I'm not familiar with kitty code, so I don't really know.

kovidgoyal commented 1 year ago

On Wed, Jun 07, 2023 at 09:14:37AM -0700, Hans Petter Jansson wrote:

I am actually ok with adding another delete mode that deletes anything in a specified rectangle as well. As a convenient shorthand for doing all the p based individual cell deletes manually.

Won't this delete entire placements even if they only intersect slightly? Also, except for the d=c option, which works on a single cell (?), the offsets seem to be spec'd as absolute, but for this to work, they'd have to be relative.

All delete options delete entire placements on intersect even d=c. There is no act like sixels mode in the protocol. I find the idea of punching holes in images pretty absurd. I can think of no reasonable use case for it apart from behaving like sixels. If you really want behavior like sixels, then split up your image into cell sized blocks and display them one image per block and delete them with d=c.

And note that in the scenario you describe writing an image would overwrite parts of existing images without knowing anything about them. IMO doing that is much worse than deleting the entire image. An application that is forking and execing chafa will presumably know where it is placing images and can manage them however it sees fit.

Please note I'm not trying to make your life hard, I'd just like to cover the corner cases without creating new ones. I thought placements having a src op would be the easiest way to do it -- it'd allow the user to set all four channels at the destination (including "punching holes"), and basically every image library supports it as a simple copy op. But I'm not familiar with kitty code, so I don't really know.

I still dont know of a good use case for this other than trying to emulate sixels behavior, which is not really motivating for me.

AnonymouX47 commented 1 year ago

For the sake of another POV:

Without Synced Update

Kitty graphics on Kitty

https://github.com/hpjansson/chafa/assets/61663146/26097dee-f049-4d92-bcad-b6b9e4b18f74

Kitty graphics on Konsole (which doesn't even support synced updates)

https://github.com/hpjansson/chafa/assets/61663146/a5876507-f2e8-4635-add3-b4b057cdd1de

ITerm2 Images on Wezterm

https://github.com/hpjansson/chafa/assets/61663146/fceb0d18-e504-4ae8-adb6-f2844f76dd2b

The single-line flickers are due to an erase-in-line sequence meant to erase any existing text (because it's a TUI), which also erases any existing image on the line, before drawing that line of the image. Last I checked, I can't remember clearly but I think this wasn't required on iTerm2 itself because image pixels (even transparent ones) overwrite text but such is not the case on Wezterm.

Despite this, I think I would go for the one with less flicker-y behaviour any day.


Of course, synced updates help to mitigate this but that's only on terminal emulators that implement the feature.

Anyways, in order to make images update properly, a sizeable amount of kitty graphics-specific code and workarounds were required:

  1. Track image widget positions and delete image drawn by a widget when its position changes
  2. Embed d=c at the top-left of every image to delete overlapped images when widget positions did not change but images were redrawn (cos the TUI framework redraws a line if at least one character on the line changed).
  3. Every time images are deleted without changing position, the graphics commands for those images have to be augmented with some extra text (different from the previous augment text) in order for the TUI framework to redraw the affected lines.
  4. etc...

Without widget tracking code on Kitty

https://github.com/hpjansson/chafa/assets/61663146/913ec909-711f-4c41-ad38-10918bf29872

Without d=c per image (basically per line) on Kitty

https://github.com/hpjansson/chafa/assets/61663146/edf2db23-0928-403f-a9e5-197b6273d252


Meanwhile, ITerm2 images on Wezterm remained the same.

My point here is... it would be really useful to be able to update images without having to delete entire existing ones first, resulting in flicker or much kitty-specific code and workaround.

kovidgoyal commented 1 year ago

On Thu, Jun 08, 2023 at 12:49:30AM -0700, AnonymouX47 wrote:

My point here is... it would be really useful to be able to update images without having to delete entire existing ones first, resulting in flicker.

Place the new image, then delete the old one. This is precisely why the kitty protocol allows you to transmit image data separately from displaying it.

Or use synchronized updates. All terminal emulators that support the kitty image protocol support synchronized updates as far as I know (and if one doesnt report it there, hopefully it will be fixed).

And not to mention that flickering will be a lot worse with sixel. No clue about iTerm. I dont think iTerm is a serious contender anyway. It doesnt even have well defined semantics for how these should behave.

AnonymouX47 commented 1 year ago

Place the new image, then delete the old one.

Well... that would only turn flicker-y to glitchy. There's barely any upside.

All terminal emulators that support the kitty image protocol support synchronized updates as far as I know.

I just mentioned that Konsole doesn't... except it just happened and I haven't updated to that version.

kovidgoyal commented 1 year ago

On Thu, Jun 08, 2023 at 01:02:56AM -0700, AnonymouX47 wrote:

Place the new image, then delete the old one.

Well... that would only turn flicker-y to glitchy. There's barely any upside.

Really? It's less than 32 bytes to do that. The chances of 32 bytes being interrupted at just the wrong place to cause the delete but not the placement to happen for even a single frame, are infinitesimal and exactly zero if you use synchronized updates. Which you should always use. This is essentially a made up problem.

All terminal emulators that support the kitty image protocol support synchronized updates as far as I know.

I just mentioned that Konsole doesn't.

So report it to them, hopefully they will implement it. And if they dont, I highly doubt they will implement whatever you are advocating to be added to the graphics protocol, so it wont make any difference to you in any case.

kovidgoyal commented 1 year ago

I just mentioned that Konsole doesn't... except it just happened and I haven't updated to that version.

There you go, non-issue.

AnonymouX47 commented 1 year ago

Okay.

AnonymouX47 commented 1 year ago

I just mentioned that Konsole doesn't... except it just happened and I haven't updated to that version.

There you go, non-issue.

No, I wasn't implying it had happened... and it hasn't.