Assessing the possibility of implementing image protocols like sixel

amano-kenji commented 1 year ago

Is vty compatible with an image protocol like sixel and kitty and OSC1337?

Since image pixels are different from characters, they cannot be cropped on the fly.

I imagine that an Image for image pixels should contain its size in columns and rows, image pixels, and cropping information so that when the image is rendered, only the cropped portion of the Image size is rendered before being passed to the terminal.

Can a data constructor of Image contain size in character cells and image pixels? If it can, then any image protocol can be implemented as third party packages like vty-sixel, vty-kitty, and vty-osc1337.

I imagine an Image data constructor for images can look like this.

ImagePixels {
  width :: Int
, height :: Int
, pixels :: ImagePixels
}

This can be wrapped in Crop data constructor. For vty-sixel, sixel renderer kicks in after all the cropping is done to ImagePixels data constructor before sending image data to the terminal.

I don't know internal details of vty. Do you think image protocols can be implemented in vty?

jtdaugherty commented 1 year ago

I doubt it's possible to support images because this requires knowledge of how many pixels wide a column is. I don't know of a way to find that out since it depends on the terminal emulator's font characteristics which as far as I know are not available to the application. Without cropping, images cannot be displayed in Vty because the image format assumes cropping is possible.

amano-kenji commented 1 year ago

I think sixel can obtain knowledge of how many pixels fit in a character cell horizontally and vertically.

Without cropping, images cannot be displayed in Vty because the image format assumes cropping is possible.

In theory, Crop data constructor can carry the cropping information until the last moment, and ImagePixels data constructor can render the actual visible portion of image pixels at the last moment and print it out to the terminal.

Is it possible to carry cropping information Crop data constructor until the last moment for ImagePixels data constructor? I imagine this is a matter of introducing case statements for different data constructors.

With knowledge of how many pixels fit in a character cell and how an image fills the allocated character cells(fully visible, fit vertically, fit horizontally, fill all character cells, ...), it may be possible to crop image pixels by character cells, but it's still desirable to delay actual cropping of image pixels until the last moment for performance and simplicity of implementation.

jtdaugherty commented 1 year ago

I think sixel can obtain knowledge of how many pixels fit in a character cell horizontally and vertically.

I was going to ask you to substantiate this, but I am realizing that I am not really interested in digging into this further. I think that if you wanted to explore whether Vty could facilitate display of images in the terminal, then a better place to start is to survey the available protocols for doing so. For instance, I found this which is an example of an alternative; the question is, what alternatives exist, and how widely-supported are they?

Sixel is a very old protocol and surely better options exist now. Are you interested in investigating?

jtdaugherty commented 1 year ago

Also, I wanted to mention this since I have seen it happen several times: I've noticed that you've posted comments on tickets only to either heavily edit them or delete them. That ends up being confusing to me, since I get email notifications about your comments only to come to Github's web site and find that the comments have either vanished or changed considerably. Please consider slowing down and being sure about what comment you want to post. If you want to amend it, that's fine, but please do so by posting a new comment asking me to ignore a previous one rather than deleting it.

amano-kenji commented 1 year ago

since I get email notifications about your comments only to come to Github's web site and find that the comments have either vanished or changed considerably.

Sorry about that. I didn't know that you were getting real-time email notifications. My mind works quite frantically once it kicks in. I have been bad at tying dirty loose ends in my mind once thinking starts. I have yet to develop a systematic thinking process that keeps things tidy in my mental landscape. I will try to hide my thinking process from you.

Sixel is a very old protocol and surely better options exist now. Are you interested in investigating?

Upon further consideration, I realized that if I know how many pixels fit in a character cell, then I can create an Image of image pixels that can be cropped cell by cell on the fly.

But, encoding image pixels into sixel escape codes, kitty escape codes, or osc1337(iTerm2) escape codes should happen when cropped image pixels are actually rendered at the last step.

Is there a function where cropped image pixels can be encoded into image protocol escape codes before they are sent to the terminal?

amano-kenji commented 1 year ago

Is there also a way to know where image pixels are supposed to be rendered?

If sixel doesn't know where it should start rendering, it must enable scroll mode to start rendering from the current cursor. In scroll mode, the bottom 6 pixels of a terminal emulator cannot be rendered by sixel.

If sixel knows where to start rendering, it can render at the bottom 6 pixels of a terminal emulator.

jtdaugherty commented 1 year ago

Upon further consideration, I realized that if I know how many pixels fit in a character cell, then I can create an Image of image pixels that can be cropped cell by cell on the fly.

Yes, that would ultimately be something we'd need to be able to accomplish. But this is still premature, because I asked whether you'd be willing to investigate other image formats. Perhaps if you found other formats, those formats might provide some idea of whether this is solvable.

But, encoding image pixels into sixel escape codes, kitty escape codes, or osc1337(iTerm2) escape codes should happen when cropped image pixels are actually rendered at the last step.

Yes, this is true, as is already the case in Vty when images get converted to a byte stream of output. The "pixel data" case would be done similarly, if we get that far.

Is there a function where cropped image pixels can be encoded into image protocol escape codes before they are sent to the terminal?

Well, yes and no: as we already established, Vty does not support pixel-based images so the current functionality only works on character cells. In addition, Vty applies a diffing process to the output to ensure that only changed terminal cells are updated in the output. That's something that would also ultimately need to be considered in the pixel image case.

jtdaugherty commented 1 year ago

Is there also a way to know where image pixels are supposed to be rendered?

If sixel doesn't know where it should start rendering, it must enable scroll mode to start rendering from the current cursor. In scroll mode, the bottom 6 pixels of a terminal emulator cannot be rendered by sixel.

If sixel knows where to start rendering, it can render at the bottom 6 pixels of a terminal emulator.

These seem like questions we ought to discuss later once we've determined whether it's feasible to have images in vty at all. I'm not yet convinced that it is possible. Here is what I would like you to investigate first before considering any others, if you are willing: please identify alternative image formats for displaying images in terminals, and present what you find here. I am happy to take a look at the resources you find in order to help evaluate whether any of them will be a fit for Vty.

amano-kenji commented 1 year ago

I better start researching this topic again after publishing brick-tabular-list.

amano-kenji commented 1 year ago

Before I go, I want to quickly let you know that it's possible to calculate pixels per cell width or cell height.

#!/usr/bin/env python
import array, fcntl, sys, termios
buf = array.array('H', [0, 0, 0, 0])
fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ, buf)
print((
    'number of rows: {} number of columns: {}'
    ' screen width: {} screen height: {}').format(*buf))

print("Pixels per cell width: {}".format(buf[2]/buf[1]))
print("Pixels per cell height: {}".format(buf[3]/buf[0]))

Example output

number of rows: 57 number of columns: 273 screen width: 1911 screen height: 969
Pixels per cell width: 7.0
Pixels per cell height: 17.0

At least, cells contain whole numbers of pixels instead of fractional numbers. I would have been disappointed if pixels per width or height were something like 6.5 or 3.14.

amano-kenji commented 1 year ago

I finished documenting brick-tabular-list. I'm just waiting for my hackage account to be accepted into uploaders group.

So, brick-tabular-list can be considered finalized.

I read about sixel, iTerm2 image protocol, and kitty image protocol.

After considering how vty works, I concluded that a 24-bit RGB vty image should fully fill character cells and should not allow alpha channel. With this assumption, all three image protocols can be trivially supported, and any future image protocol that makes the current terminal image protocols obsolete will be supported.

If a graphical vty image fills a character cell partially or has alpha channel, then the current design of vty cannot calculate which character cells should be updated. Think about partially transparent images above text. If a character cell is filled partially or a graphical image has alpha channel, then vty will have to calculate whether pixels have to be updated. vty was designed to calculate which character cells should be updated.

I came up with the following imaginative data constructor for Image.

ImagePixels {
  pixels :: 24bit-pixels
, hashes :: [[HashForEachCharacterCell]]
, width :: Int
, height :: Int
, renderingFn :: ... -> ... -> IO ()
}

Since each character cell has a fixed number of pixels, a hash can be calculated for the image content in a character cell.

When vty crops ImagePixels, it will crop hashes and pixels and subtract from width or height.

renderingFn will be provided by vty-sixel, vty-kitty, or vty-any-image-protocol.

After vty renders ImagePixels, it records pixel hashes for character cells. I got this idea from bittorrent file chunk hashes. vty will compare hashes to calculate which character cells should be updated.

How do you think about this?

jtdaugherty commented 1 year ago

Thanks for doing some more research and putting some thought into this.

a 24-bit RGB vty image should fully fill character cells and should not allow alpha channel

I think this makes sense.

ImagePixels {
  pixels :: 24bit-pixels
, hashes :: [[HashForEachCharacterCell]]
, width :: Int
, height :: Int
, renderingFn :: ... -> ... -> IO ()
}

My thoughts on this:

How will the width and height be computed? We still need some way to know the relationship between pixel density and terminal cells.
A list of lists here will be memory-inefficient. We'll probably want to use a different sequence container type such as a container backed by C arrays.
The rendering function does not belong here. The rendering function is determined by the output encoding which happens later, so carrying it in the Image results in unnecessary tight coupling between the image and the implementation.
Is the choice to do hashing here informed by how the existing protocols work?

Overall, before I'm ready to talk about how to implement this, I need to understand more about the protocols that we have to choose from. I can't evaluate an implementation until I know how the encoding(s) work. This continues to be my request of you: look at the existing encoding options, present what you find here, and then let's discuss what is feasible. It's a bit too early to discuss implementation.

amano-kenji commented 1 year ago

How will the width and height be computed? They will not be computed from image size. Image pixels will be scaled to the given width and height in character cells. We can already determine how many pixels fit in a character cell. Scaling can be handled by vty or third party libraries. swaybg defines the following scaling modes // stretch, fill, fit, center, and tile. sxiv supports 100% scale, fit large images into window, fit image to window, fit image to window width, and fit image to window height.
A list of lists is inefficient. I know. I merely wanted to depict concepts in pseudo-code. We can use Vector (Vector Hash) in real code.
Is vty going to directly implement terminal image protocols?
I thought hashes were going to be computed purely from image pixels. Sixel colors are usually 8-bit palette colors in 256-color terminal color registers. Terminals usually have 256 color registers or 1024 color registers. However, images in 8-bit palette colors usually look similar to images in 24-bit color space supported by iTerm and kitty. I don't think programmers are going to use two image protocols simultaneously in their programs. Even if they used more than one image protocol, I think hashes should be computed purely from image pixels.

jtdaugherty commented 1 year ago

We can already determine how many pixels fit in a character cell.

How?

Is vty going to directly implement terminal image protocols?

Yes. Vty will be responsible for taking the abstract Image and rasterizing it into escape sequences and output data, the same way it already does so for colored text, style data, etc. So naturally it will need to take care of emitting the appropriate cursor movements and escape sequences to update the right cells on the screen.

amano-kenji commented 1 year ago

How? https://github.com/jtdaugherty/vty/issues/255#issuecomment-1416661505. https://sw.kovidgoyal.net/kitty/graphics-protocol/#getting-the-window-size defines one more way to get the window size in pixels, but I prefer ioctl.
Vty will be responsible? I was envisioning a future where vty defines a framework for image protocols, and image protocols are implemented by third party libraries. If vty had to support all image protocols internally, it would be bloated. By the way, sixel is the only thing that resembles a standard. Kitty image protocol can change any time. OSC 1337(iTerm) image protocol also can change any time.

jtdaugherty commented 1 year ago

I see, thanks for sharing that link. Assuming the window size in pixels is reported without the non-cell pixel area, those numbers should be okay. (Many terminal emulators are slightly larger than a perfect multiple of the cell pixel size.) The other consideration is that an image that is less than an exact pixel multiple of the cell size would use a fraction of a cell when rendered.

I was envisioning a future where vty defines a framework for image protocols

I'm not that interested in making the rendering backend for Vty pluggable, which is essentially what this means. But I'm going to say this over and over again: this is an implementation consideration that it's too early to worry about because I still don't know which protocols you're recommending supporting. What I want to know is:

What protocols exist?
How widely-supported are they? Are they just specific to one terminal emulator? Is there a standard that has been adopted by multiple emulators? What are the failure modes for terminal emulators that don't support those encodings?

If vty had to support all image protocols internally

I don't want this, so this isn't a future I'd worry about. :) This is why I've asked you to provide info about the existing protocols. If we're going to support this in Vty, then I want to support whichever protocol is most widely-deployed in terminal emulators. If there is a clear winner, that's the one I'd rather support. (And "winner" here means widely-supported by existing emulators already and not baroque in its design.) If that turns out to be hard because there are too many protocols or they're all poorly-supported or fail badly in terminal emulators that don't know about them, then I would rather not add the feature at all. I don't think there's enough demand for images in Vty to justify the headache.

jtdaugherty commented 1 year ago

Judging by your comment above about standards, it doesn't sound like there really is any (and sixel is not looking like a good contender).

amano-kenji commented 1 year ago

an image that is less than an exact pixel multiple of the cell size would use a fraction of a cell when rendered.

I think the only thing that makes sense is to just scale or shrink image pixels to the given width and height in character cells. You pick a size in character cells, and image pixels will be scaled or shrunk to fill the given size.

What protocols exist? sixel. OSC1337 (iTerm2), kitty image protocol. New protocols are being designed. https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/26 is an attempt at designing a standard image protocol. As the terminal-wg issue suggests, sixel is not a good image protocol.
How widely-supported are they? Are they just specific to one terminal emulator? Is there a standard that has been adopted by multiple emulators?
- Sixel // mlterm, xterm, yaft(flawed implementation of sixel), mintty, https://github.com/kmgrant/macterm, https://github.com/wez/wezterm, https://github.com/liamg/darktile, iTerm2, https://github.com/algon-320/toyterm, alacritty(there is a pull request for sixel support.), foot terminal emulator, GNOME's vte, konsole
- iTerm2 OSC1337 // iTerm2, wezterm, macterm
- kitty image protocol // kitty, wezterm may support it in the future.
What are the failure modes for terminal emulators that don't support those encodings? They just ignore unsupported terminal image encodings.

amano-kenji commented 1 year ago

Whatever image protocol vty ends up implementing without third party libraries, I think this issue is for determining whether vty can have any image protocol without being slow.

To determine performance characteristics, I need to ask a few questions.

Should vty draw an entire image at once every time or only for the first time?
Should vty update some character cells of an entire image after the image is drawn fully?
Should vty draw an image cell by cell and avoid drawing it in entirety?
Will the image protocol let us determine which image pixels go in a character cell? The current image protocols certainly do, but I'm not sure about https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/26

jtdaugherty commented 1 year ago

I'm closing this issue since this discussion is not working for me. As the vty maintainer, it's important to me to evaluate all aspects of a request or proposal. While I understand that you want this feature, I am going to decline to do any further investigation of it for now. In the future, it would be very helpful when you engage with me on my libraries not to presume what the investigation should or shouldn't be about. I want to help, but it's very difficult to do that when you insist on a framing of an approach that I've repeatedly asked to change.

jtdaugherty commented 1 year ago

I thought that it might be helpful to outline how I will want to engage on any library feature if you end up wanting to open more tickets about new ideas.

I'll want to with a discussion of the motivation and need behind the request. Why would you like the feature? Before I understand this, I don't want to spend energy on implementation and design considerations.
If there are alternatives to consider, I'll want to understand those (as was the case here). This helps us both understand the space of options, their trade-offs, the impact of choosing particular alternatives, etc.
If there are alternatives identified that I think would be feasible to implement, the next step is to think about design and abstraction that cover the features and use cases that need to be supported. That's something I'm going to have a lot of influence over since, after all, I maintain the library. I'm going to want to pay a lot of attention to API ergonomics and use cases. I also want to think about technical debt that might be incurred by a particular approach, or impact to users of the library, issues that might arise in providing end-user support, etc.
At this point the next step is implementation, possibly also returning to (3) to iterate. To start on this step I'll want to have a very clear idea of the implementation plan that we've arrived at by steps (1) through (3).
To the extent that performance is critical for the feature, I'd be interested in considering performance issues. For most features, the most typical approach is to use standard implementation techniques to avoid obvious performance pitfalls while not spending energy on micro-optimizations. As the saying goes, premature optimization is the root of all evil, and it's usually best to wait until performance issues arise in practice since that's the best way to illuminate where performance issues actually live (which are often counterintuitive).

With all that said, while I am happy to consider patches, when it comes to developing a design, abstraction, and implementation, I need that to be a collaborative process. As the person most familiar with the code and with a design vision for the library, I'm going to need to influence that. And if we can't reach a conclusion that I think will serve the library, then I cannot move forward, especially if it's code that I need to be willing to maintain.

I hope all that is helpful next time you would like to open an issue to discuss something like the topic we discussed here.

amano-kenji commented 1 year ago

Why would I like the feature? I'd like to do UI development on terminal since I haven't found a good GUI library outside terminal. I told you about Gtk and Qt which are terrible. Displaying image is essential if I don't want to use existing GUI libraries in the future.
"If there are alternatives to consider". Did you envision implementing a specific terminal image protocol directly in vty? In my knowledge, you will be dissapointed later if you are tied to any specific terminal image protocol now. If there is any hope in a good image protocol, it might be https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/26. I can't envision myself committing to any specific terminal image protocol for now if I were to write something like vty.

jtdaugherty / vty

Assessing the possibility of implementing image protocols like sixel #255