wez / wezterm

A GPU-accelerated cross-platform terminal emulator and multiplexer written by @wez and implemented in Rust
https://wezfurlong.org/wezterm/
Other
17k stars 767 forks source link

Sixel graphics support #217

Open wez opened 4 years ago

wez commented 4 years ago

This issue tracks open items around sixel graphics

wez commented 4 years ago

@klamonte: I'd appreciate your feedback/suggestions on this. Current master has basic sixel support and is able to replay the sixel based captures from the jexer download page, but I'm not sure of the best way to validate this!

ghost commented 4 years ago

Hi @wez, some general items (mostly from mistakes I've made :) ):

It'll be exciting to see more sixel support out there! :)

yoichi commented 3 years ago

WezTerm-macos-20201101-103216-403d002d crashes with following command. Is there some error handling issue?

printf "\x1bPqh"
ModProg commented 3 years ago

Maybe in the future other Escape Codes could be implemented as well, for example kitty jumps in my mind for being quite sensible Kitty . Some thing like iTerm2 or Terminology is probably less interesting.

wez commented 3 years ago

The kitty protocol is complex to implement and model and I have no plans to do that at this time.

wezterm has pretty solid implementations of both sixel and the iTerm2 image protocols already. Sixel has broad application level support, and iTerm2's protocol has growing adoption.

ModProg commented 3 years ago

The kitty protocol is complex to implement and model and I have no plans to do that at this time.

Maybe that would be something that could happen some time in the future, one thing that is a bit annoying with sixel is not being able to display imges in cells, but Kitty's protocol is very involved, that is true.

ghost commented 3 years ago

DECSDM appears to be interpreted incorrectly by xterm, such that using it can break behavior on terminals that were tested against real hardware. Credit to @j4james for documenting this: https://github.com/hackerb9/lsix/issues/41 .

j4james commented 3 years ago

FYI, regarding the DECSDM issue, MLTerm has just accepted a PR to switch its DECSDM implementation to match XTerm. Apparently Thomas was well aware that XTerm was doing it incorrectly but had chosen to keep it broken anyway, so MLTerm has now decided they are better off just matching XTerm's broken behaviour. As of now, I'd estimate there are more broken DECSDM implementations than working ones. I hate everything about this.

ghost commented 3 years ago

I'm surprised Dickey didn't just add a resource to swap the behavior (it's literally 'h' vs 'l'), with the default being to keep doing what it does now. Thing is that xterm is a very reasonable replacement for real hardware, and there aren't too many of those around. But TERM=xterm does imply matching what 'xterm' does first and foremost, even if that has some mistakes compared to what 'xterm' is trying to copy.

j4james commented 3 years ago

But TERM=xterm does imply matching what 'xterm' does first and foremost

In this case though, you've literally got to tell XTerm to switch to a VT340 mode (or one of the other graphics terminals) in order to get any sixel functionality. There's definitely an implication that it is actually emulating a VT340 rather that its own sixel variant, so this behaviour just seems bizarre to me. That said, XTerm's sixel implementation was never really usable as a VT340 emulator anyway (like most other open source terminals), so I suppose one more bug doesn't really matter.

j4james commented 3 years ago

For anyone not following along on the lsix issue, Thomas has just commented to say that he would "wait til you've investigated the VT340", which I take to mean he is willing to update XTerm's behaviour if it can clearly be shown to be incorrect. I'm not sure what problem he had with the previous evidence, but I'm just happy he is still open to be persuaded, and there's a chance we may get a sane outcome here.

ghost commented 2 years ago

wezterm's sixel is very solid, but alas on the slower end. foot, xterm, and (sad to say) my own Swing backend can outpace it.

I am skeptical that I can help, but am interested in poking around. Is the sixel decoder entirely in term/src/terminalstate/sixel.rs and termwiz/src/escape/mod.rs ? Can the decoder be run separately from the full terminal?

wez commented 2 years ago

I'm sure there's room for optimization: that code is unoptimized, and I would love for it to be improved!

Yeah, it's those two files. You could extract the bulk of TerminalState::sixel: https://github.com/wez/wezterm/blob/ebef7dc659ccac6a99c592b0f21401992cfb5bd8/term/src/terminalstate/sixel.rs#L11-L106 into a helper function in that file and then set up a test function at the bottom to benchmark; something like:

#[test]
fn test_sixel_decode() {
   let start = std::time::Instant::now(); 
   // build a Sixel
   println!("sixel build took {:?}", start.elapsed());

   let start = std::time::Instant::now();
   let data = call_extracted_function_from_above(sixel);
   println!("sixel decode took {:?}", start.elapsed());

   panic!("just to ensure that the test fails and shows its output");
}

then cargo test -p wezterm-term to build and run the tests from term as a low-ish effort way to isolate and iterate on it

ghost commented 2 years ago

@ModProg

...one thing that is a bit annoying with sixel is not being able to display imges in cells

I haven't yet found a way to write image data to the bottom row (DECSET 8452 didn't do it for me), but otherwise sixel can be placed anywhere on screen, and if one is careful it can be mixed in with text:

xterm with translucency and animations

It doesn't do a proper +/- Z axis though, so you can't easily put image and text in the same cell. You can lay text down and then sixel over, but not sixel down and then text over. Is that what you are referring to?

ModProg commented 2 years ago

You can lay text down and then sixel over, but not sixel down and then text over. Is that what you are referring to?

No, I thought you cannot just say make this image 4x4 cells, but have to use pixels? not sure right now. Haven't really dealt with sixels a lot :D

ghost commented 2 years ago

Ah I see. Yes, iTerm2 and Kitty now, and eventually the future "Good Image Protocol", will be able to say "take this picture, and stick it those cells, stretching/scaling like so". Sixel is pure pixels, so you need to know the text cell aspect ratio to be able to place it.

j4james commented 2 years ago

you cannot just say make this image 4x4 cells

On a real DEC terminal, and on terminal emulators that strictly emulate the protocol, you can usually assume a cell is 10x20 pixels in size, so if you write out an image that is 400x800, it will occupy exactly 4x4 cells.

ghost commented 2 years ago

@j4james BTW how is the DECSDM story? I believe that was a blocker for sixel in Windows Terminal?

j4james commented 2 years ago

@klamonte Sorry, I should have followed up on that here. The DECSDM problem is pretty much under control now I think. Almost everyone that supports DECSDM is implementing it correctly now.

The main reason I'm holding back on the release of my Windows sixel implementation is because it's still conhost-only at the moment - to get it in Windows Terminal we need to get a pass-through mode implemented first.

There are also a few sixel issues that would be nice to get resolved first. For example, I'd like to get other terminals to standardize on the correct cursor positioning, and I think there's an issue with the Jexer palette handling that I was hoping to persuade you to fix. But those aren't really blockers.

ghost commented 2 years ago

@j4james If you get some time, please open an issue for me. I've got a new encoder in git head that uses per-image palettes and can generate sixels with transparent pixels, this would be a great time to fix it up for compatibility.

j4james commented 2 years ago

I haven't yet found a way to write image data to the bottom row

@klamonte This is another one of those things that would be possible in sixel if terminal emulators actually implemented it correctly. Admittedly it's not that straightforward, but if you know the cell height is 20, you can avoid triggering a scroll by starting the image 3 rows (i.e. 60 pixels) from the bottom, and leaving the first 40 scanlines transparent (assuming you only want to affect the last row).

You can get this to work with terminals that use a different cell size, but it make things a lot more complicated. And the main requirement is for the terminal to position cursor correctly when the image is complete, i.e. it should end up on the last row of the image, not below the image (which would otherwise trigger a scroll). Most terminals don't do that correctly.

ghost commented 2 years ago

@wez I'm going to try for that bottom row: https://gitlab.com/klamonte/jexer/-/issues/91

If it works, I'll do a PR for DECSDM for you after.

ghost commented 2 years ago

Hey @wez , I'm putting together the DECSDM PR. Would you be OK with renaming terminalstate.sixel_scrolling to terminalstate.sixel_display_mode and corresponding DecPrivateModeCode::SixelScrolling to DecPrivateModeCode::SixelDisplayMode ? That would be more in line with VT382 doc (credit @j4james for the link) : https://vt100.net/dec/ek-vt38t-ug-001.pdf#page=132 .

wez commented 2 years ago

Yep, that sounds fine!

ghost commented 2 years ago

@wez It looks like putting transparent sixel pixels over other non-transparent sixel pixels is making those non-transparent pixels transparent. Did that make sense? Here's what I mean:

https://user-images.githubusercontent.com/4357501/150208696-83a51d00-9686-4d9d-bca9-ccf8b9c528dc.mp4

You see the image appearing on the bottom row (woohoo!) but all the other images above that row gone.

Is this intended behavior in the current design? It feels like an error to me that we should fix.

Also for performance, it seems like assign_image_to_cells() might not be culling cells with fully-transparent images. Regardless of DECSDM we should get that fixed. (Someone we know likes to blit a certain Segway-riding orca who has a lot of transparent cells around her. ;-) )

ghost commented 2 years ago

From much earlier: https://github.com/wez/wezterm/issues/217#issuecomment-642782747

  • Pixels-overlapping-text: sixel is supposed to do that as per spec, but in practice no one uses it that way right now. IMHO it would be perfectly acceptable to make text and image data mutually exclusive in a cell if that makes the implementation easier.

I was wrong, so so wrong. :-)

ghost commented 2 years ago
                match params.style {
                    ImageAttachStyle::Kitty => cell.attrs_mut().attach_image(img),
                    ImageAttachStyle::Sixel | ImageAttachStyle::Iterm => {
                        cell.attrs_mut().set_image(img)
                    }

...

    /// Assign a single image to a cell.
    pub fn set_image(&mut self, image: Box<ImageCell>) -> &mut Self {
        self.allocate_fat_attributes();
        self.fat.as_mut().unwrap().image = vec![image];
        self
    }

That looks like a blind "replace image" rather than a "blit image over and replace". Right?

Need instead something that will do:

                    if (oldCell.isImage()) {
                        // Blit the old cell image underneath this cell's
                        // image.
                        newImage = new BufferedImage(textWidth,
                            textHeight, BufferedImage.TYPE_INT_ARGB);

                        java.awt.Graphics gr = newImage.getGraphics();
                        gr.drawImage(oldCell.getImage(), 0, 0, null, null);
                        gr.drawImage(cells[x][y].getImage(), 0, 0, null, null);
                        gr.dispose();
                        cells[x][y].setImage(newImage);

Probably not attach_image() though, since for sixel/iTerm2 once the old pixels are overwritten they are gone forever, unlike Kitty where one could add/remove intermediate layers.

wez commented 2 years ago

Ah, I'd assumed that sixel would logically the replace the current cell, rather than composite with an existing sixel cell. Compositing for the sixel case should be doable, although it's probably going to be fiddly because of the way that ImageCell and ImageData are structured; I think you'll essentially want a method on ImageCell that:

  if has_existing_image {
     let bg_image = existing_cell.get_view();
     let dest_image = RgbaImage::new(dimensions);
     dest_image.copy_from(bg_image);
     let fg_image = new_cell.get_view();
     image::imageops::overlay(dest_image, fg_image, 0, 0);
     // and now attach_image dest_image in its place
  }
ghost commented 2 years ago

@wez I'm not quite following the the trait/match stuff, sorry! I've got a long way to go still on Rust.

Would it be OK if I started a PR that compiles, and then over there we can get the ImageCell/ImageData/ImageDataType worked out?

wez commented 2 years ago

Definitely!

ghost commented 2 years ago

Just noticed also the action item "Figure out what the default color register map should be".

@j4james kindly provided some guidance for me over here that may be helpful. It sounds like the default colors for 0-16 are the ANSI colors, with color index 0 being the background color. But there are some wrinkles when shared palettes are used.

So we would want to change default_color_map():

fn default_color_map() -> HashMap<u16, RgbColor> {
    let mut color_map = HashMap::new();
    color_map.insert(0, RgbColor::new_8bpc(0, 0, 0));
    color_map.insert(3, RgbColor::new_8bpc(0, 255, 0));
    color_map
}

@j4james Would it be as simple as changing the function above to assign ANSI colors 0-15 to entries 0-15? (The colors would be the ones from the user, a.k.a. "command line theme" colors ala solarized/etc.) These represent the starting colors for both the shared registers, and the private image registers. The former would persist across sixel calls, the latter are fresh for each image.

ghost commented 2 years ago

Additional test programs/data:

j4james commented 2 years ago

Would it be as simple as changing the function above to assign ANSI colors 0-15 to entries 0-15?

@klamonte Unfortunately it's not as simple as that. The DEC sixel devices didn't support SGR colors, so their color table isn't ANSI compatible. You can see the default values for the VT340 color table here:

https://github.com/hackerb9/vt340test/blob/main/colormap/showcolortable.png

Note the blue-red-green order where ANSI would be red-green-yellow. Also the upper eight colors are less saturated rather than bright.

But for a modern terminal this shouldn't really matter, because you're not likely to want to share the graphics and text palettes. So my recommendation would be to leave your text palette with whatever color scheme the user has chosen, and initialize the sixel palette with the default VT340 colors (this is what most modern terminals do).

For colors above the first 16, I've seen a couple of different approaches, but one of the most common is to use the XTerm 6x6x6 and grayscale palette to get you to 256, and then white for the rest (if your palette is larger than 256). I don't think this really matters, though - last I checked XTerm just had them all as black.

If you want to have a strict VT340 compatibility mode where the text and graphics palettes are actually kept in sync, then all you really need to worry about are colors 0 for the text background, 7 for the foreground, and 15 for bold. It gets a bit more complicated once you take DECSTGLT into account, but I suspect most people wouldn't care about that.

ghost commented 2 years ago

@j4james I will set the initial colors to match VT340, thank you!

I was also hoping to get showcolortable.sh to work too. We are not responding to the Request Color Table report (CSI 2 ; 2 $ u). This is also not documented or supported by xterm.

Do you know offhand which VT model documented it?

j4james commented 2 years ago

It's in the VT330/VT340 Text Programmer Reference on page 220, in the VT520/VT525 Programmer Information on page 268 (the query) and 315 (the response), and in the DEC STD 070 reference manual on page 364. Those are PDF page numbers btw, so you should be able to navigate there directly in some PDF viewers, but they won't match the document's internal numbering scheme. You can also probably search for DECCTR and/or DECTSR.

j4james commented 2 years ago

Btw, if you want to compare your implementation with other terminal emulators, I know this is supported by MLTerm, RLogin, and Reflection Desktop.

j4james commented 2 years ago

It just occurred to me now, given that we're discussing sixel here, that you might be thinking of using DECCTR to report the graphics colors. If so, that's probably not a good idea.

If you've got a strict VT340-compatibility mode, where there's a single global color table shared by both text and graphics, then this isn't a problem. But if the graphics colors are independent of the text colors, as is the case in most modern TEs, then DECCTR should probably be returning the text color table.

Bear in mind that this report is also intended to be used with DECRSTS as a way of setting the color table. That functionality isn't really needed for images, but it's the only way you can set the palette for text output.

ghost commented 2 years ago

Right now I just want enough to support showcolortable.sh and just the RGB color space, to verify that we start with the same colors as the screenshot. And thank you for the reference, I see it here on pp. 220-222 (DECRQTSR, respond with DECCTR).

DECTSR looks pretty awful. Uhhh...someone else can implement that. ;)

(Now I have another experiment I want to do: use DECRQTSR/DECCTR to get the sixel palette, and try just blindly using it with a ditherer and see what you get. Chafa has a really fast palette mapper -- what would it look like if you just used the palette that is already there? I mean single-frame picture quality would be bad of course, but would it still be good for fast-moving animation?)

j4james commented 2 years ago

Now I have another experiment I want to do: use DECRQTSR/DECCTR to get the sixel palette

That's the point I'm trying to make. You can't get the sixel palette with DECRQTSR/DECCTR. Every TE I'm aware of that supports that sequence returns the text palette, which is not the same thing. The only reason it works on the VT340 is because the text and graphics palettes are shared (and I'm fairly sure that's not the case for WezTerm).

ghost commented 2 years ago

No the palettes are definitely not shared in wezterm.

I'm kind of surprised we have OSC 4 then, since that does the same thing. Did someone else make OSC 4 without being aware of DECCTR?

So on the hardware terminals, was the VT340 unusual in sharing the text and sixel palette? Or did most/all of them do that? If they all did, maybe it would make sense to use DECCTR for sixel and OSC 4 for text? Obviously not for a faithful VT340.

j4james commented 2 years ago

I'm kind of surprised we have OSC 4 then, since that does the same thing. Did someone else make OSC 4 without being aware of DECCTR?

I believe OSC 4 was added to XTerm in 1999 (see patch #111), which is more than 10 years after the VT340 was released. But in those days it wasn't nearly as easy for terminal emulators to obtain documentation as it is today, so it's not that surprising that they weren't aware of the existing sequences.

So on the hardware terminals, was the VT340 unusual in sharing the text and sixel palette? Or did most/all of them do that?

No. I think all of the DEC terminals that supported sixel would have shared the text and graphics palettes, assuming they had an editable palette. That was definitely the case for the VT240 and VT340. But the VT340 is the only sixel terminal I'm aware of that also supported the DECCTR report.

If they all did, maybe it would make sense to use DECCTR for sixel and OSC 4 for text? Obviously not for a faithful VT340.

It wouldn't be a faithful VT525 either.

Also, I'm not sure what your use case is here. Ignoring the querying for the moment, what effect were you expecting when setting the palette, when almost none of the modern terminal emulators support palette-based images?

ghost commented 2 years ago

It's not a serious idea, mainly playing around.

I'll put it here not to continue polluting the thread... I'm finding that simple Floyd-Steinberg dithering goes really far even for limited palettes: DOOM ~~is~~ will be quite playable in [only 8-16 colors.](https://gitlab.com/klamonte/jexer/-/issues/93#note_818212607) This video is cheating slightly in that each text row has a _different_ 16-color palette, but I suspect a shared small palette would still work (for animations though, not still images). So even if a small palette worked, why? I've got a huge performance bottleneck in image encoding, especially for animations. (Well, prior to DOOM I didn't think I did. But now I want to play DOOM. ;-) ) It sounds like chafa's approach can find nearest matching color _very_ quickly, regardless of the actual palette. So I am curious if it would look good (for fast animations) with mostly "any" palette, i.e. even the one the terminal started with; which might also lead to more investigation on dithering (Bayer vs FS vs others). If there is a way to query the sixel palette, great; otherwise I will experiment with the stock VT340 sixel palette.

Regardless of the animation stuff above, it would be nice to have the option of not stomping on the sixel palette. I've seen issues out there for resetting palettes (for example an open one in notcurses for the Linux VGA), plus your concern on color index 0 for Jexer. If I could query the sixel palette reliably then I could incorporate those colors into a larger palette, never write to registers 0-15, but still use them. That was kind of the whole line of thought. But I don't want to deliberately fracture a working standard.

If the hardware and mature TEs report text colors for that query, then we should do the same. But maybe have an option on the debug build (does Rust support that?) to report sixel palette so that we can use showcolortable.sh to verify that we initialize sixel to the same colors as VT340 -- which is the actual task item in this issue.

j4james commented 2 years ago

OK, I get your use case now, but wouldn't it be easiest to assume a standard VT340 palette for the first 16 colors (which is likely the case 99% of the time anyway)? So if you want to "preserve" those colors, always write our the first 16 with default values, and add your custom colors after that. That seems like a reasonably safe approach to take.

But if you really want to be able to run things like showcolortable.sh and other VT340 tests successfully, then maybe you should consider supporting different modes, similar to XTerm's -ti option. One of my ideas for this was to use the DECSCL sequence, so switching to conformance level 3 would make the sixel implementation strictly VT340 compatible (i.e. only 16 colors, shared global palette, etc.). Similarly, level 2 would make it VT240 compatible, and level 1 would get you a VT125. Just a thought.

jgcodes2020 commented 2 years ago

As another test case, I've been working on a sixel-based terminal video player. On MLTerm I can get near-realtime performance on a 1920x1080 monitor. WezTerm, however, can only play video at 2-3 seconds per frame, so there seems to be a lot of room for optimization.

wez commented 2 years ago

@jgcodes2020: I couldn't figure out how to get tvp to build on my Fedora 35 system, so I reached for notcurses-demo instead. I've made some perf improvements; I'd appreciate hearing how much of a difference that made for your app!

@dankamongmen: I made some sixel perf tweaks that seem to put wezterm's sixel in the same ballpark as kitty for intro and view. I wanted to try ncplayer with an mp4 file to see a simpler fps summary, but it always seems to use the ascii blitter (running on https://github.com/dankamongmen/notcurses/commit/d046b2c4065b6e11ac92134f45bcb56f41c44525).

@AutumnMeowMeow: I think you might want to try this out with your jexer and doom stuff! :-)

dankamongmen commented 2 years ago

@dankamongmen: I made some sixel perf tweaks that seem to put wezterm's sixel in the same ballpark as kitty for intro and view. I wanted to try ncplayer with an mp4 file to see a simpler fps summary, but it always seems to use the ascii blitter (running on dankamongmen/notcurses@d046b2c).

erp! it sure does! =\ what the hell?

dankamongmen commented 2 years ago

erp! it sure does! =\ what the hell?

as a workaround for now, pressing '6' ought get you pixel output. the issue is due to a series of input events i'm getting from wezterm, i think maybe related to pixel-mouse events. all my bug, i think.

dankamongmen commented 2 years ago

https://github.com/wez/wezterm/issues/1781 will solve this issue, i'm pretty sure.

wez commented 2 years ago

Confirmed that it was my bug in the end :-p Thanks for digging in!

dankamongmen commented 2 years ago

https://getyarn.io/yarn-clip/79ffed87-857a-4e79-a1ec-72566eeacb20