microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
94.15k stars 8.15k forks source link

Feature Request: sixel graphics support #448

Open migueldeicaza opened 5 years ago

migueldeicaza commented 5 years ago

Would like to see Sixel support in the Terminal, this is the standard used to show graphics in the console.

Sixel is part of the original DEC specification for doing graphics in terminals and has been re-popularized in recent years for doing graphics on the command line, in particular by Pythonistas doing data science.

The libsixel library provides an encoder but is also a great introduction to the subject (better than the Wikipedia page):

https://github.com/saitoha/libsixel

piranna commented 3 years ago

Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.

Me too, just only it would be better to know the font has a different aspect ratio, so image can adjust itself and keep the correct one.

One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries

I think it's better to center them.

hackerb9 commented 3 years ago

Maybe I'm misreading this thread. Are we actually talking about the terminal faking 10:20 characters for sixel image? I think that will cause many problems like the Bond distortion. Doing it the right way may be more difficult, but, in my humble opinion, a modern terminal should be font agnostic and leave it up to application programmers to deal with sixels and character cells.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application. The image viewing program I use works exactly like that. As I change font family or size, the displayed thumbnail updates to always be precisely five text lines high. The width is scaled proportionally for the image, unless it would be larger than a certain (in this case, rather large) maximum. By basing the image size on the character cell, it works automatically on high-DPI screens.

While the VT340 is a noble goal to emulate, fixing character cell resolution at 10:20 (and thus limiting resolution for the entire screen) is a mistake. The VT340 was only one of several sixel implementations, so its font size isn't necessarily more correct.

Forcing 10:20 will also lead to ugly kludges. (E.g., how to respond to a request for the size of the terminal window in pixels. Tell the truth, presuming they'll be positioning windows on the screen? Or, always return 800x480, presuming the user is scaling images for sixel output?)

j4james commented 3 years ago

Are we actually talking about the terminal faking 10:20 characters for sixel image?

Yes.

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size. The downsides of such an approach are:

  1. It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.
  2. If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).
  3. If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment). But in my opinion, this is not the best approach for Sixel.

OhMeadhbh commented 3 years ago

I have 300+ VT340's in nuclear power plants that I would like to eventually replace.

There are commercial terminal emulation packages we could use, but I think all but one have been EoL'd.

We have replaced some of them with Linux PCs running XTerm (or less frequently, Win10 + Hummingbird + WSL running XTerm), because it has a half-way decent open source sixel implementation and a sort of bad, but open sourced ReGIS implementation.

The likelihood that we will be writing new software for the part of this system that generates the sixel octet stream is NIL.

If your objective is to send graphics over an inline octet stream, there are other options. But if you want to support sixel graphics, you should support sixel graphics in a way that is halfway similar to previous implementations. This, unfortunately, means you should emulate the behaviour of exemplar systems (i.e. VT240, VT241, VT330 and VT340 terminals) even when it comes to integrating graphics with text.

This is a mock-up of the kind of thing I'm talking about. It would be very nice if any new Sixel implementation maintains compatibility with existing implementations so images do not run off the edge of the screen or only fill half the screen.

https://vimeo.com/user32814426/review/467991744/ac5892fa7e

hackerb9 commented 3 years ago

a modern terminal should be font agnostic

This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.

I meant the terminal should be font agnostic instead of imposing 10:20 on every font. The application should be able to know the actual font size, if it wishes, since it's the application that knows the domain of what it is trying to show and can figure out the best way to present text and graphics together.

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size.

Yup, that's about right. There's also a query to directly get the character cell size, but I don't think that's as widely supported as just getting the screen size and dividing by ROWS and COLUMNS.

The downsides of such an approach are:

1. It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.

That's not a downside. It only means the program has to fall back on doing what it would have done anyway: presume $TERM=="VT340" means character cells are 10:20, "VT240" means 10:10, "mskermit" means 8:8, and so on.

Also, it's not an xterm proprietary sequence. Getting the screen size is called a "dtterm" escape sequence, but it was actually first implemented in SunView (SunOS, 1986). I believe it was later documented in the PHIGS Programming Manual (1992). Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.

2. If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).

This is not a problem. The program simply traps SIGWINCH and only recalculates if the window has actually changed.

3. If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.

Yes, sixel is extremely inefficient. But on modern computers, sending full screen images is quite usable, even over ssh. Does the Microsoft Terminal have some sort of baudrate limitation?

By the way, I believe sixel does have a "high DPI" mode where every dot is doubled in width and height. I've never used it and I don't think xterm even implements it, but perhaps that would alleviate concerns about bandwidth.

That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment).

This "mode" is simply having characters and graphics aligned just like the various historical sixel terminals did and current emulators do. I admit, I don't understand why it is not possible to do the same in Microsoft Terminal. If you say this 10:20 kludge is the best that can be done, I will trust that you are correct and thank you for doing it. A distorted picture is much better than nothing.

piranna commented 3 years ago

Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.

@hackerb9, what's the actual escape sequence to get the font dimensions?

hpjansson commented 3 years ago

The relevant XTerm sequences can be found here: https://invisible-island.net/xterm/ctlseqs/ctlseqs.html -- look for XTWINOPS.

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Just as a data point, the sixel branch for libvte is taking the cell size-agnostic route @hackerb9 is talking about. It treats incoming sixel data as "pixel perfect" and rescales previously received images across zoom levels and font sizes to cover a consistent cell extent. When merged, this implementation will be available to a large share of Linux terminal emulators, including GNOME Terminal, the XFCE Terminal, Terminator, etc. Superficially this seems to be interoperable with at least XTerm and mlterm.

Since libvte records a per-image virtual cell size, it'd be trivial to make this work with a fixed virtual 10x20 cell size too for interoperation. However, we'd need a way for programs to communicate their expected pixel:cell ratios to the terminal (e.g. by extending the DCS parameters). That could be very useful in general, since it'd also provide a form of pixel density control in bandwidth-constrained environments, as you touched on above.

piranna commented 3 years ago

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

Linux console returns always 0... they should fix that, though, but seems are not willing too :-/

csdvrx commented 3 years ago

What if the font has a different aspect ratio than 10:20?

The image may be a bit stretched or squished, (...) The alternative would be to go with a pixel perfect image (not currently feasible with conpty, but let's pretend for a second). Bond no longer looks squished, but now the image is only a fraction of the size it was expected to be. And the higher the resolution of your monitor, the worse this is going to look.

Also note that there is no reason we couldn't have options to tweak the exact behaviour when the font aspect ratio isn't 1:2.

Actually, there are 2 reasons: Windows Terminal supports neither TIOCGWINSZ nor OSC 14

One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries. Any of these choices would be better than an exact pixel rendering in my opinion.

This should not be left solely under the control of Windows Terminal: applications have ways to introspect the terminal properties, and adapt their behaviours. If they can't do that, the terminal implementation is broken.

Currently, any software outputting sixels can't introspect the size on Windows Terminal, so it can't adapt the size of the sixel images it sends to the font being used by the terminal. However, with the number of rows, columns, and the x,y size of the terminal in pixel, this is easy to do - and I would be surprised if the power plant monitoring software used as an example in https://github.com/microsoft/Terminal/issues/448#issuecomment-708127508 by @OhMeadhbh didn't already do that.

Problem is Windows Terminal doesn't return the correct values through TIOCGWINSZ, and doesn't support either the OSC14t query, so there's no way to make t work.

I opened a separate issue and referenced #448, as properly returning the windows size would help a lot.

As @hackerb9 pointed out in https://github.com/microsoft/Terminal/issues/448#issuecomment-708184522 :

Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.

Indeed, XTWINOPS is extremely basic and expected to be working.

As @hpjansson mentionned in https://github.com/microsoft/Terminal/issues/448#issuecomment-708433602

Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.

I know, for I use that in production, and Windows Terminal not supporting either method breaks things.

csdvrx commented 3 years ago

I have 300+ VT340's in nuclear power plants that I would like to eventually replace. There are commercial terminal emulation packages we could use, but I think all but one have been EoL'd.

I don't know if it's the right place to say that, but I offer these kind of services.

It would be very nice if any new Sixel implementation maintains compatibility with existing implementations so images do not run off the edge of the screen or only fill half the screen. https://vimeo.com/user32814426/review/467991744/ac5892fa7e

I have specifically written software that does just that: making sure whatever sixel sequence is sent will be displayed correctly, resizing it if needed to make the sixels fit properly.

If your budget is limited, I will soon release a new version of tmux-sixel that includes some of these features.

For example, it supports sixels in multiple panes (of course):

sixel-tmux_multiple-panes

But it also supports scrolling, with textmode renditions to make it fast. Here, I displayed several images on the left pane, and I'm scrolling in the history (current position 10 of 129 lines)

sixel-tmux-scrollmode

These renditions also let tmux-sixel provide a fallback mode. It allows terminals that are not sixel aware to still "see" what the sixels represent, like current Windows Terminal and the various libvte terminals, instead of having some blank space:

sixel-tmux_fallback_mode_in_windows-terminal

As you may have noticed, I scrolled up a bit further to show you how everything that has been received is kept and displayable. Notice how the total is 128 lines now and how the math formula on the right handside is aligned differently: this is because of text reflow, to match the different terminals using different fonts and different resolutions.

If needed, the panes can be resized. In case you are not familiar with tmux, it is multi user capable: the users of the sixel-aware terminals and those of the "regular" terminals can all be attached to the same shared session, like a text-mode RDP / VNC remote session : they see the same thing, only in different quality depending on their terminal. And any one of them can type, so they can all work cooperatively.

Of course, text mode can't perfectly replace images, so it's not perfect, but still it can go quite far: this is what just 161 columns and 41 rows can give you - check the result of TIOCGWINSZ below

tmux-sixel-fallback-161-40

If you are uncertain it may fit your needs, and first want to test the performance on your images, that part is open source, so you can evaluate it using https://github.com/csdvrx/derasterize

Let me know if you would like to get in touch!

christianparpart commented 3 years ago

@csdvrx oh dear, that looks awesome. You have done some really great work! I'd like to avoid some too early advertisement here, but how can I get in touch with you on very similar matters? You can email me christian@parpart.family with a very short "Hi" or so, so I can click on Reply. Sorry for the interference, but Github doesn't have private messaging yet (does it?) :-)

piranna commented 3 years ago

I think It has, but if not, you can always look for the other user email on their profile page :-)

yatli commented 3 years ago

@csdvrx looks like you've made major breakthrough on this -- congratulations! Haven't revisited my branch since 2017 but I think now is the time!

csdvrx commented 3 years ago

@csdvrx looks like you've made major breakthrough on this -- congratulations!

Thanks! For the fallback mode, the next step is adding back fine details that have been stripped.

For example, you can infer the presence of the lemur hair under its chin, but you can't see them nicely. This is due to the loss of what are low frequencies in the spectral domain (FFT), as can be seen more easily if you run derasterize on https://github.com/csdvrx/derasterize/blob/master/samples/wave.png which I use for tests: it's simply a simulated collimator interference pattern, but it saves you the trouble of comparing spectrograms of inputs and outputs images.

missing-hires

Then it's obvious that the loss in more present at some frequency bands and following some vectors, and of course more extreme in lower resolution (Shannon-Nyquist, duh!)

missing-lowres

My goal is to improve the result mostly in the diagonal and vertical, as the loss there is due to the set of characters chosen.

But you can't just try every Unicode glyph while looking for a better fit, as the problem is combinatorial explosion: if you want to keep some applications (ex: playing videos), you can't simply multiply the combining characters and the test character set, then test everything for every frame. Even with serious optimization, it's too slow.

The 2 best approaches seems to be 1) a band pass filter to concurrently select the character and the combining character 2) another pass sequentially, testing fitness improvement of just adding combining characters.

The difference in thinness between combining characters and regular characters make that perfect for a band pass approach, but it will require some calibration, so I'm partial to (2) as this would guarantee there's no regression: either the combining character improves the fitness score and it's added, or it doesn't and the current results remain.

Anyway, I will also release new versions of derasterize soon, with a new set of glyphs and a better optimized color picker. Ascii Art is becoming my new favorite Christmas activity :-)

Haven't revisited my branch since 2017 but I think now is the time!

Thanks a lot for your great work on tmux, which inspired me a lot!

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

This way, you can fork it!

Another thing someone may want to do is to release another https://github.com/saitoha/xserver-sixel : when sixel support comes to the Windows Terminal, it will be possible to run X inside, on vanilla Windows 10. This would be a major advancement for WSL.

yatli commented 3 years ago

But you can't just try every Unicode glyph while looking for a better fit, as the problem is combinatorial explosion: if you want to keep some applications (ex: playing videos), you can't simply multiply the combining characters and the test character set, then test everything for every frame. Even with serious optimization, it's too slow.

Sounds like a small scale convolution neural network would do... Get some ground truth pics. Chop it into character-sized tiles. For each tile, search for the best representation (slow is ok for training). Throw a tile into the NN and force it to generate the good representation. Done ( (Just my random thoughts at 3AM)

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

Sure please go ahead!

piranna commented 3 years ago

It's a bit crazy that you are trying to get high resolution images on the terminal using unicode characters, and in https://piranna.github.io/tty.css/ the most difficult thing I'm facing off is to get images to look low-res like they are shown in a terminal with block characters 😅

csdvrx commented 3 years ago

Sounds like a small scale convolution neural network would do... Get some ground truth pics. Chop it into character-sized tiles. For each tile, search for the best representation (slow is ok for training). Throw a tile into the NN and force it to generate the good representation. Done

It's am interesting idea: basically, selecting the most advantageous 'base blocks' using the equivalent of a LUT/rainbow table - except it will be embodied in code.

However, it would require a separate training step, which would have to be redone whenever new glyphs are added. Also, it may limit future evolutions by being a black box and not playing nicely with other refinements. More on that below.

(Just my random thoughts at 3AM)

These are good thoughts!

The core problem of encoding images is not new. What's new is the specific limitations that are imposed by going to a text format:

In a way, the problem reminds me of the history of TV image encoding: since some information is more important (luminance), it historically constrained the design: black and white TV was first, just like ASCII art was black and white first. In a way YPbPr is analogous to ANSI colors: you "spice" the most important signal with some extra eye candy, that can be safely ignored by limited decoders, so that at least you can get the basic signal, the most important part.

I've thought about different approaches, they all have their drawbacks:

On top of that, quality and speed optimize in opposite directions, that may limit some usecase (ex: playing a youtube video in unicode in real time). Simple solutions have their places, but they may divert us away from the bigger picture. The ideal approach should be versatile or at least flexible.

A memoization like approach is currently being introduced, but in a different way: the public version of derasterize uses a 4x8 block. Unicode characters were selected based on this limit. The version that will soon be released uses a 128 bit for the 8x16 block decomposition canvas, to introduce more glyphs.

Obviously, extending to 8x16 increases the runtime needlessly for glyphs that are equivalent in 4x8 and 8x16: a typical example is the half-blocks, like U+2584 lower half block and U+258C left half block.

The approach selected is not really a memoization: it works the other way around! The input image is downsampled to 2 different resolutions, and the algorithm uses both downsampled version for computing the fitness of the individual blocks as it iterates through the glyphs: when it hits those that are amenable to a simpler computation (like the halfblocks), it uses the downsampled version for computing the fitness score on 32 bit, as it will be faster than doing that on 128 bits!

In a way, it's like a binary decision diagram, except it's at the algorithm level.

I've been considering introducing other refinements. Fine lines using combining characters are not just for lemur hairs (!!), but for a big use case: graphs. This is because you don't want the thin lines to become blurry and be lost in the details of block decomposition.

But there are many other low hanging fruits!

Here's another simple example: circles may be worth preserving, because they are extremely important in human perception. And it can be fast too, if you use the right tools from historical computer vision: a few Hough transforms could quickly isolate the circles.

And I don't just want to only add the combining characters on the premade blocks in a second step (on top of my head U+0307, U+0308, U+030A, U+0323, U+0324, U+0325, U+0359, U+035A, U+0360 cf https://en.wikipedia.org/wiki/Combining_character) like for the fine details of the lemur hair. I mean introducing the set of round glyphs: ⴰ ⸰ ° ᴼ o ᴑ ○ O O ◯ ⚪

The NN approach could certainly dealt with that, except if circles fall on the border between blocks during the decomposition step... then, they would be ignored.

However, maybe it's worth tolerating a few pixels of difference in the circle center? And that's even if that would totally ruin the fitness score when it's computed at the pixel level. Maybe that's true even if the circle is put on the wrong 8x16 block.

It's not clear cut, so some advanced rule could be made to reserve that to the most important features (ex: preserve circles when they stand out in luminance) - and unfortunately all this would be lost in a NN approach.

If you find such problems interesting, would you be interesting in working on derasterize?

It's the testing ground for tmux-sixels, by working on standalone images first.

But before revisiting your branch, could you please wait a bit to allow me to be done with the code review on mine?

Sure please go ahead!

Thanks!

I will try to get the important feature upstreamed in tmux, but I fear it may be refused, as the author doesn't see the usecase for sixels and expressed his opposition in the past.

Yet sixels are so practical once you get a taste of them that I found it worth forking tmux. So did you, and a few others.

BTW I see you've cloned wemux. Apparently, you are inconvenienced by tmux limitations. If the upstreaming fails, what about joining our effort?

We could maintain our fork tmux-sixel, and add all that's missing in regular tmux!

yatli commented 3 years ago

@csdvrx good idea. I'm interested in co-op these topics. Let's move our discussions to tmux/derasterize issues and give room for windows terminal discussion here 😅😅

ghost commented 3 years ago

bump

DHowett commented 3 years ago

@vulpinefoxxo Was it necessary for you to e-mail the 250 people subscribed to this issue so that you could request a status on a bug we've explicitly stated is not in scope for the near future?

You can register your agreement with the +1 button, and you can watch this thread with the subscribe button. Rest assured: any updates made to this thread will end up in your inbox. :smile:

j4james commented 3 years ago

Is there anyone here that has access to an actual VT340 terminal that is willing to run some compatibility tests? You'd just need to be able to output a text file to the device and photograph the result. I have about a dozen such tests I'd like to run, but depending on the results there may be some follow-ups. No need to commit to anything, though - any help would be appreciated.

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

orcmid commented 3 years ago

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

I have a question about ECMA-48 here. That specification (and its ISO counterpart) was last updated in 1991. Is there a provision in ECMA-48 that accomplishes this device-specific feature?

I'm referring to the Windows Console and Terminal Ecosystem Roadmap, which cites ECMA-48, and the Console Virtual Terminal Sequences defined so far.

KalleOlaviNiemitalo commented 3 years ago

@j4james, would a VT420 be suitable?

piranna commented 3 years ago

@j4james, would a VT420 be suitable?

https://en.wikipedia.org/wiki/VT420

There were no color or graphics-capable 400 series terminals; the VT340 remained in production for those requiring ReGIS and Sixel graphics and color support.

So... no, seems a VT420 is not suitable, but thanks anyway :-)

KalleOlaviNiemitalo commented 3 years ago

@orcmid, the only ECMA-48 reference I can find in the console documentation is in Classic Console APIs versus Virtual Terminal Sequences:

These sequences are rooted in an ECMA Standard and series of extensions by many vendors tracing back to Digital Equipment Corporation and Tektronix terminals, through to more modern and common software terminals, like xterm.

AFAICT, ECMA-48 does not define sixel graphics or any similar feature. However, it defines the opening and terminating delimiters of the device control string that DEC uses for sixel graphics, so a terminal that does not support sixels can at least recognise the delimiters and avoid displaying the bytes between them as text. The data formats defined in ECMA-48 leave a lot of space for vendor-specific extensions.

ghost commented 3 years ago

Is there anyone here that has access to an actual VT340 terminal that is willing to run some compatibility tests? You'd just need to be able to output a text file to the device and photograph the result. I have about a dozen such tests I'd like to run, but depending on the results there may be some follow-ups. No need to commit to anything, though - any help would be appreciated.

And if nobody has a VT340, a VT330, or even a VT382 would be OK.

Instructions for emulating VT-series terminals via MAME are here. .

A dump of some of the VT340 ROMs is here, along with a pointer to the discussion around getting those.

I have done the VT102 MAME, it worked alright. I haven't tried to get a VT340 going, and do not know if MAME got that capability or not. But perhaps there is enough out there to now to do it.

j4james commented 3 years ago

@j4james, would a VT420 be suitable?

@KalleOlaviNiemitalo Unfortunately that won't do. But thanks for the offer. That would definitely be useful once I get around to working on some of the VT420 functionality.

I haven't tried to get a VT340 going, and do not know if MAME got that capability or not.

@klamonte Unfortunately the MAME VT3xx driver is just a skeleton - it's not functional. Their VT240 implementation is OK, and I've been testing with that as much as I can, but it doesn't have all the features of the VT330/340.

orcmid commented 3 years ago

. The data formats defined in ECMA-48 leave a lot of space for vendor-specific extensions.

It appears that DEC used the extension provisions as required, the usual agreement among implementations proviso entering into it.

It strikes me that Windows Terminal is not ncurses and I don't see any provision for device-specific selections that embrace extensions..

Another concern, however, is regarding accessibility. That has come up in issue #7766. I have no idea how there are requirements with respect to that for WT. I had not thought about that wrinkle and it impacts my ideas about demonstrating CUA display of the MS-DOS variety. I can see how extending outside of text, even raw Unicode text, might ECMA-48 already. Must look.

ghost commented 3 years ago

@orcmid

I had not thought about that wrinkle and it impacts my ideas about demonstrating CUA display of the MS-DOS variety.

Do you mean what is commonly called TUI today, i.e. mouse-driven textmode windowing? If so, you may find this to be of interest. I do not have a Windows system with Terminal to test it on, but someone else told me that mouse was working as of version 1.4.3243.0 on Windows 10.0.19041.1 (using a Debian WSL instance).

WSLUser commented 3 years ago

@j4james , do you have a build you can publish on your fork that provides Sixel? I can think of a few things to use for testing/debugging such as the sample projects using notcurses. My biggest hope for this feature in fact is to use notcurses applications in WSL2 (you will find the library is available in many distro repos).

j4james commented 3 years ago

@WSLUser I'm afraid I don't have any plans to publish my build in the short term. It was really just an experimental framework for me to test different implementation strategies with, and to investigate whether it would be feasible for us to support both standard VT340 applications (which is my primary use case) as well as more modern sixel derivations (which require extended functionality).

Unfortunately what I've found so far is that modern apps often rely (usually unnecessarily) on broken behaviour in XTerm, and unless we replicate that behaviour (which then makes us unusable as a VT340 emulator) we won't be able to run those apps. So before releasing anything, I thought it might be best to try and get some of those apps fixed first, but I want to be absolutely certain of my facts when reporting bugs, which is one of the reasons I'm looking for a VT340 to test with.

I should also mention that I've been talking to some other terminal devs to see if they'd be willing to agree on a standard for apps to negotiate extended sixel functionality, again in the interests of supporting both VT340 and more modern apps. Unfortunately that discussion doesn't look like it's going anywhere, and I'm about ready to give up at this point.

csdvrx commented 2 years ago

In case it might benefit anyone who has subscribed to this thread but doesn't want to wait for sixel support, I have decided to release the version of sixel-tmux that had been demonstrated here last year.

Compared to the previous version that only respected sixel sequences and therefore required a compatible terminal, this new version provides an immediate way to display sixel content inside most terminals (and therefore Windows Terminal) as it features an integrated derasterize.

This provides a way of converting sixels into something that can be displayed even by terminals that can't handle sixels natively, as long as they have enough colors for the content (fortunately, WT already support truecolor)

The source code is on https://github.com/csdvrx/sixel-tmux and the binary for msys2 on https://github.com/csdvrx/sixel-tmux/blob/main/tmux.exe

It works nicely in Windows Terminal to offer sixel-like features without being much slower than a standalone derasterize: see for example how it display the usual test images, after which derasterize is being used to show the lemur example:

sixel-tmux-inside-windows-terminal

Here's another one with Windows Terminal next to mintty, then being fed the content currently displayed in mintty when it connects to the shared session:

sixel-tmux-inside-both-mintty-and-windows-terminal

To launch it inside WT, please use script due to https://cygwin.com/pipermail/cygwin/2020-May/244878.html for example /usr/bin/script -c '/usr/bin/tmux c' /dev/null to create a new session or /usr/bin/script -c '/usr/bin/tmux a' /dev/null to attach to an existing session started in another terminal.

@yatli let me know if you find this sixel-tmux interesting, I would enjoy working with you on practical improvements as explained before

@hpjansson feel free to add other supports besides sixels, or to integrate a chafa backend if you can make a license exemption for tmux (as it's BSD)

hpjansson commented 2 years ago

@csdvrx Amazing work! Since you already integrated Derasterize, you probably don't need Chafa. But if you want a unified API to simultaneously support Kitty, iTerm, sixels and full-Unicode pseudographics at some point, you should be able to link with it from BSD source as the LGPL permits that. If there are license or other issues holding you back, I'm happy to help.

Maybe we should consider continuing the general sixel/terminal graphics talk in a discussion thread... Feel free to create one in the Chafa repo; as a project it touches on more or less every aspect of terminal graphics. E.g. we just started a thread about improving font support for pseudographics.

csdvrx commented 2 years ago

@hpjansson thanks, I'm also a big fan of your work and I love how you implemented in chafa some of the unicode ideas I was pondering with :)

We can totally talk on another place if you prefer, I just wanted to make the first announcement here to reach out to people who may be waiting for a sixel support in Windows Terminal (maybe @migueldeicaza who started the request?)

Now, even if it's imperfect, they have a working stable solution: along with a few other persons, I have been using sixel-tmux for a year, it's stable, no weird issues except sometimes not being able to recognize and intercept very large sixels image (I suppose some tmux "optimization" is eating text again)

There's one issue with having terminal of different geometries synchronizing on one size for the derasterized output when in fallback mode, but there can't be a solution for that during shared session unless the sixel source is kept and separately derasterized on each client, which would be wasteful (but could be done when sixels sequences will be preserved)

The best I could come with is to let the geometry of whoever had the input dictate the size to the others, so that at least the person who requested the image got it of the right size. Others can tweak their font size to match the geometry (that's what the dots are for, giving feedback!)

As for formats, the idea is not to stop there, but to input "anything" and output "anything", so the sixels -> {sixel | derasterize } pipeline is just a beginning.

I don't have much experience with other formats like Kitty, but I see tmux as a simple place to put all this plumbing: this way, formats will cease to matter, and console users will be able to mix and match the graphical tools regardless of which precise format (say sixel or iTerm) their software require: as long as their terminal supports at least one known format, sixel-tmux could do the conversion.

A consequence of this idea is the desire to have everything under a BSD license ("universal donor") to facilitate code diffusion and adaptation - say in Terminal emulators.

Play a bit with it sixel-tmux if you can, see if you like the concept; a collaborative work would be in the best interest of the people who like graphics in terminals because there are not that many of us, yet much work to be done!

hackerb9 commented 2 years ago

That looks fun! How does sixel-tmux do when tested with sixvid -b nyantocat.gif?

hpjansson commented 2 years ago

@csdvrx Regarding moving: I'm just concerned that we're hijacking an issue where the poor MS Terminal developers are trying to track their work :) But if they don't mind, it's obviously a non-issue.

zadjii-msft commented 2 years ago

At the moment, I don't care 😋 We know that sixel is something we need to work on, we've got some steps towards getting it to work done already. I'm fairly confident that @j4james is continuing to experiment with it. When we need to wrest control of this thread for our own feature tracking, I'll come back through and mark it all as off topic. Till then, go for it.

naikrovek commented 2 years ago
  1. rasterize terminal text characters to image/texture.
  2. overlay any graphics (sixel or otherwise) on image/texture created in 1.
  3. blit image to screen (the compositor, really)

once 1 is working, which requires 3 to display, injecting 2 should be of relatively little concern. for game developers, I bet all three of these could be done in a week. they live and breathe rasterization and blitting.

keep your eyes open for a performance-delivering update to Terminal in the coming year. I bet you that sixel support will not be far behind. I have no inside information, and this is all educated guesswork.

my understanding is that 1 and 3 are being worked on, now, for performance reasons. number 2 won't be far behind.

j4james commented 2 years ago

image

zadjii-msft commented 2 years ago

crazy-son-of-a-bitch

christianparpart commented 2 years ago

image

Haha. Github comment of the day! Source code or it didn't happen. :-P

j4james commented 2 years ago

Source code or it didn't happen. :-P

Here you go: https://gist.githubusercontent.com/j4james/9c2e67686306e2c37aa07e71fe1d2504/raw/7e0a1d7c6a0206e801241b4f23324c5a6a2d0997/owl.txt

But note that it requires a terminal that can emulate a 10x20 cell size for the owl to be positioned and sized correctly. I've also simplified the original code a little, and cut the image palette down to 15 colors, so it should theoretically work on a real VT340 now.

hackerb9 commented 2 years ago

Success! Tested on a real VT340 and it worked perfectly.

For evidence, here is the output from the VT340 after I told it to send a MediaCopy to the host (essentially a screenshot) of the VT340 screen in sixel format:

https://gist.github.com/hackerb9/fb5eb56391e51de23af6dd5cedb12464/raw/ac9032df3afd62ea6e5f6f0b6a5621923c1a1630/vt340mediacopy.six

And here is that MediaCopy file converted to a PNG:

MediaCopy output from VT340

Note the occasional glitches where a byte has gotten its eighth bit set high is an artifact of my mediacopy.sh script. The owl looks perfect on the VT340's screen.

ghost commented 2 years ago

@naikrovek cough cough Scroll all the way to the bottom for some notes on mixing images and text in a single cell.

Has anyone requested SGR-Pixels (mouse mode 1016) yet? I see it mentioned here as not supported. Decode sixel, add SGR-Mouse, and you've got a viable new gaming medium. Sure would be nice to have some roguelikes that could put real images in when they needed them.

Or if we need to be all business-y to justify it, how about a nice little MSPaint app that works over ssh?

@PhMajerus

While implementing Sixel, it is important to test with images that contain transparency.

Transparent sixels are available here from @hackerb9 and here from me, and can be generated by @hpjansson 's chafa and the git head version of this.

j4james commented 2 years ago

I wasn't considering 1016, but I do have a POC of the DEC locator mode. There's no point in doing either of them until we have sixel, though, because the pixel coordinates will need to be tightly coupled to the sixel resolution.

zadjii-msft commented 2 years ago

Moving a discussion comment from #13024

In theory, we could work on DRCS and Sixel support without having to figure out how they transit ConPTY up front, which will let us parallelize that work

Could we now just flush the frame with TriggerFlush(false) to end any current buffered conpty content, and then pass that string through? I think we're getting pretty close here to getting the strings to the Terminal, albeit not rendered yet

j4james commented 2 years ago

Could we now just flush the frame with TriggerFlush(false) to end any current buffered conpty content, and then pass that string through?

I don't think so, no. I suppose if all you want to do is "cat" an image from the command line with something like img2sixel, that might suffice, but anything complicated will probably break. We either need the full passthrough mode working, or for the conpty renderer to be capable of regenerating the sixel on the fly so it can repaint areas of the screen that have been invalidated. Of the two, passthrough mode seems more feasible.

ofek commented 1 year ago

Has there been any update on this?

zadjii-msft commented 1 year ago

Nope. We'll make sure to update this thread when there is. In the meantime, might I recommend the Subscribe button? image That way you'll be notified of any updates to this thread, without needlessly pinging everyone on this thread ☺️

Erquint commented 5 months ago

Happy New Year without Sixel, fellow dreamers!

Speaking of subscribing and sitting tight… GitHub seems to have basically hid this subscription from my account even despite re-subscribing. Having to find it through search to check back on the lack of progress. Guess 5 years is just too old to be tracked.