Open dnkl opened 9 months ago
Hi @dnkl. The VT340's algorithm should not be followed too strictly in modern terminals. Despite what the documentation implied, it uses a fast heuristic which relies on the character cell being 20 pixels tall. That algorithm was faster but included a glitch which should not be copied.
If you are designing a terminal that uses characters that are not 20 pixels tall, the algorithm does not apply and will have to be adapted in one of two ways:
I strongly believe the first method is the correct one for most modern terminals. It lets programmers easily create software that integrates graphics with character cell text interfaces, which is to me what makes sixels useful.
If you've read my discussion with j4james about whether this VT340 behavior is a "glitch", you'll see that even though he believes it is the historical behavior and thus correct for any terminal that claims to emulate a VT340, neither of us could come up with an easy solution for application programmers who want to just splat a sixel image on the screen and show some text underneath it. Since a workaround requires the application to model the internal state of the VT340, no sane program will ever intentionally use this odd behavior, whether it is technically a glitch or not.
@hackerb9 I don't mind changing foot to always put the cursor on the last row touched by the sixel (i.e. the bottom pixel of the last sixel).
What I don't want is slightly different behavior in modern terminals, and I was under the impression that the other "correct" terminals also followed the DEC algorithm? If not, I'd be more than happy to update foot.
That said, it looks like chafa isn't emitting a newline at all, so even with the tweaked cursor placement (always put it on the last row touched by the sixel), the image is sometimes cut off.
@PerBothner @christianparpart @wez I was hoping we could all agree on how to implement cursor placement after emitting a sixel. As far as I can tell, foot, DomTerm, Contour and Wezterm all place the cursor on the same row as the last sixel. But do you follow the DEC algorithm, and place it on the same row as the upper pixel of the last sixel, or do you place it on the last row touched by the sixel (i.e. the row containing the bottom pixel of the last sixel).
I know at least some of you have been following the discussions between @hackerb9 j4james, but I don't know what you ended up implementing. From an application point of view, I think it would be beneficial if we all implemented the same cursor placement algorithm...
Foot currently implements the DEC algorithm, but I think it would be easier for applications if I changed it to just place the cursor on the last row. Then, to print text under the sixel, you know all that's needed is (always) a single newline. Not one or two.
But, I think it's a bad idea to change foot if all other sixel terminals implement the DEC algorithm, and don't want to change.
I agree putting the cursor on the row containing the bottom sixel row makes more sense, and I can certainly change it if that the consensus. I prefer to match xterm.js for various reasons. https://github.com/jerch - what do you think?
@PerBothner Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore. I did this to allow to print pictures in non 6-multiple px height and still properly align them at the bottom w'o nonsense excess row or excess space at the bottom. (There is still a bug attached to it, where empty sixel bands at the end might get truncated - https://github.com/jerch/node-sixel/issues/58)
I'm open to tweaking wezterm to be more sane, assuming that there are a couple of test cases with examples of where the cursor should end up.
FWIW, I think the current cursor placement in wezterm may well be a bit of a fluke arising from re-using the iterm2 image protocol logic that preceded it rather than a conscious effort to implement the vt340 algorithm.
wezterm's logic for this (shared by iterm2, kitty and sixel handling) can be found here: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L65
the vertical position: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L166-L170
the horizontal position: https://github.com/wez/wezterm/blob/22424c3280cb21af43317cb58ef7bc34a8cbcc91/term/src/terminalstate/image.rs#L233-L246
It may be a good idea to get @arakiken and the other mlterm developers on board too. I've been testing with it, since it had one of the first implementations, and is still one of the fastest. It currently (as of version 3.9.3) places the cursor on the row immediately after (that is, the first character row not touched by any sixel, transparent or not).
My main concerns as an application writer are a) consistency between terminals and b) simplicity of design. I'll happily support any consensus terminal developers arrive at.
Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore.
I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.
How about this:
For sixels without an explicit width/height (no raster attributes), assume all sixels are 6 pixels tall. I.e don't bother inspecting the image looking for transparency.
For sixels with an explicit width/height, use the specified height.
following along for notcurses, good to see this effort taking place
Sorry to intrude....
I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).
A unified vertical position is good enough for aligning images with text or other images vertically but not horizontally (i.e side-by-side).
Yes, it's probably possible to workaround this using absolute cursor positioning or save/restore but these are not always viable options, plus I believe the purpose of a consensus includes eliminating the need for workarounds in applications anyways.
Thank you all.
My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically.
Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom left corner of the sixel.
@hpjansson related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional?
It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead.
Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least.
Be happy to move this to a separate issue if you'd prefer that.
My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically. Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom left corner of the sixel.
i went and looked at what we do in notcurses, and we do a hard cursor position after emission of any sixel. i imagine any application wanting to be portably correct will have to do the same thing, no? since they might be dealing with old terminals, or noncompliant ones, and it's not indicated via term queries? i don't want to disrupt unification, but from an app/toolkit author's perspective, i don't see how this helps...?
I agree, mostly. It's important we unify the vertical placement since it affects scrolling.
Horizontal placement isn't as important, except if a terminal places it after the image, in which case it might affect scrolling if it ends up being beyond the last column.
@hpjansson
I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this.
No you are right. This "bottom-most colored pixel" behavior cuts a fully transparent line of pixels at the bottom as not being part of the original image. If an image has that line intentionally, it will get stripped. Thats for level 1 sixel.
Correct me if I'm wrong and there's a way around this.
Well I put another warning into the docs not to use level 1 sixel on encoder side anymore, but to go with level 2 with explicit raster attributes denoting width and height extend. DEC STD 070 also tells us, that the graphics extends in raster attributes should never be exceeded by encoders, thus my decoder uses these to trim the graphics, which also solves the issue of non multiple-of-6 image heights in a more deterministic way. We already had several discussions about the worth of the sixel chapter in DEC STD 070 and how that deviates even from DEC's own machines. Imho DEC STD 070 is the only lengthy source from DEC, thats tries to sound normative, e.g. by implying certain limits on the sixel format, like height and width, or 256 color slots rule. Maybe they did that to get it in line with other industry standards of that time (I guess 256 colors support was at the top notch end of the 80s hardware caps), but it kinda never came into life as they soon stopped the whole sixel line.
@AnonymouX47
I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer).
Thats not possible with sixel level 1, it has no width idea. Every sixel band can have different sixel cursor width to the right (an image might be ragged to the right) - which one to choose from? Sixel level 2 brings width&height with its raster attributes, so yes that could be used for a right border. To support both conformance levels - only the start cursor offset in a line is determined, which basically leads to the VT340 cursor mode.
Btw xterm.js also uses the VT340 cursor for IIP as the only supported cursor mode to level out image sequence differences. While it is more annoying to deal with that cursor mode as app dev, if you want to place text right of the image, its handling is always the same:
Foot currently allows "level 2" sixels to be extended, both vertically and horizontally. That is, the image will be resized, if necessary, to accommodate whatever the encoder is emitting.
I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.
I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.
Yepp, it reduces code complexity alot, and on perf side - it is actually ~40% faster during sixel decoding because of known upper bounds prehand in my decoder.
Thanks (@dnkl, @dankamongmen and @jerch) for the clarifications and suggestions. I guess I can work with those.
EDIT: ... as regards cursor horizontal placement.
@dnkl
related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional? It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead. Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least. Be happy to move this to a separate issue if you'd prefer that.
I don't remember exactly how intentional it was, but when I wrote most of the encoder back in 2018 I had to work around issues in existing decoders. For instance, I specify the raster dimensions but still make sure to pad every sixel row to the full width, since I noticed a case where the terminal would have garbage in the image buffer otherwise. It's possible the GNL was required by a decoder at some point.
That said, after testing it again now, it seems e.g. mlterm behaves the same with and without the GNL; I think it opens a new sixel row only when its pixel data starts arriving. I don't know of anything that needs the final GNL anymore, so I'll remove it.
I'm also partial to the idea that raster attributes should preempt dynamic resizing. It makes things more predictable for everyone.
I'm glad to see all the terminal developers here working together!
If I can summarize, it sounds like everyone is in agreement that modern terminals should allow what I will refer to as splat-nl-print: Applications may send sixels to a screen and simply send a newline before any text if they do not wish to overlap the graphics. Although VT340 compatibility is not the highest priority, I can add that my tests show splat-nl-print as the algorithm of choice even on a real VT340 as the occasional glitch is vanishingly rare in actual usage.
Additional points brought up:
Should the width and height specified in the Raster Attributes (RA) be used as a clipping box despite DEC's documentation explicitly stating RA does not limit the size of the image? Personally, I think, "Yes". It is a reasonable optimization for modern terminals when there is exactly one RA present in the sixel data stream. However, I would also hope modern terminals would be robust enough to fall back to unoptimized rendering when necessary — for example, no RA in the image, multiple RAs, an RA with zero width / height, or data where the program doesn't know the size ahead of time. (Sidenote: I do not expect any modern emulator to be able to handle @jerch's endless scrolling sixels.)
Should the text after a new line overwrite transparent pixels at the bottom of the graphic? I believe so unless the RA width and height specify otherwise.
Should Graphic New Line scroll the screen immediately before pixel data is received? Yes, I think that is correct. And applications encoding sixels should not output a final Graphic New Line at the end of the stream.
How can positioning of text to the right of a sixel image be made easier for application developers? I agree that a new issue should be created to discuss this. (If someone does, please @ me in the discussion as I'm curious about possible solutions.)
If you're going to define your own version of Sixel, can you please make it something that apps can opt into or out of with a mode. Worst case, if you don't want to implement both standard and non-standard cursor placement, you could still report the mode as permanently set, and then apps can at least tell what behavior to expect from the terminal.
Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.
@j4james I don't believe there is a "standard version" of Sixel. That is part of the problem: Different implementations act differently. Is "standard Sixel" whatever DEC implemented in their terminals? Are all such terminals consistent? What about the specifications (manuals) from DEC? What about corner cases not convered in the manuals? What about xterm - and which version of xterm? If all of these were consistent, I'd consider that as "standard sixel" - but I'm pretty certain that is not the case,
Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior.
UPDATE I have determined that I was mistaken about Xterm's behavior regressing. In fact, it is now almost precisely correct. The one thing it is missing, however, is moving the text cursor down on Graphic New Lines, which just happens to be the default output from ImageMagick's convert
tool. @ThomasDickey.
Here is a new script, textcursor2.sh, which shows how a TEXT NEW LINE (or, equivalently, CURSOR DOWN) separates a sixel image from any following text on a VT340 with its 20 pixel high character cell.
It also shows what happens when GRAPHIC NEW LINE is used; the most important feature of which seems to be that it acts exactly like a single text new line whenever the image height is a multiple of the character cell height.
Should the text after a new line overwrite transparent pixels at the bottom of the graphic? I believe so unless the RA width and height specify otherwise.
Not sure I agree this is a good idea. Is that what the VT340 does? Or is there another reason for doing it this way?
If we choose to truncate images with raster attributes set, I don't really see the need for special casing the last line of transparent sixel bands - the encoder can simply emit a sixel with raster attributes to get an exact sized image.
Yes, that is the way the VT340 does it, or tries to, anyhow. To not do it that way would cause peculiar artifacts.
Xterm works the same as the VT340 and, while it took some time to get it right, hopefully that means it'll be easier for other terminals which can use it as a reference implementation.
I completely understand not wanting to special case the last band of transparent sixels. It seems yucky to me, too. I believe DEC engineers had the same thoughts and went to the extreme of inventing a wacky heuristic that works swiftly (if not perfectly) by presuming character cells are exactly 20 pixels high. That algorithm does not work in general, so we may have to just OR together every sixel in the last row to see what the lowest opaque pixel actually was. I believe technology has advanced sufficiently since 1988 that it should be possible.
Using Raster Attributes to truncate images seems a reasonable optimization for current hardware, but it is not ideal. And, since Sixel images aren't required to set the geometry, it is not always applicable.
Looking at Xterm's source code reminded me of a minor point: while a text newline ('\n'
) is the preferred separator between an image and text, there are other options.
Nothing is also a possibility, but I do not know of what utility it is, if any. DEC clearly had some idea, though, as its behavior is not as simple as "always overwrite from the top left".
Again, this is minor. Reverse engineering and implementing the "correct" behavior of any of these is low-priority compared to getting text newlines working the same everywhere.
UPDATE: Thank you to @j4james for pointing out that "newline" is ambiguous. I mean the C '\n'
character which moves the cursor down and to the leftmost column. UNIX systems encode newline as ASCII LF, 0x0a
, but other operating systems may be different.
UPDATE: Another thank you to j4james for correcting me that Down does not scroll the screen and that one must use Index. I have updated the information in the list above.
Thanks for clarifying the issue with trailing transparent rows.
I actually did it that way in earlier versions of foot, simply because XTerm did. I changed it when I couldn't find any evidence that's what should happen.
I guess I'll add it back :)
With that, I agree with everything in your list of suggested behaviors.
Is "standard Sixel" whatever DEC implemented in their terminals?
@PerBothner Yes. I can't think of anyone better suited to define the standard than the people that invented the protocol.
Are all such terminals consistent?
To the best of my knowledge, yes. The VT125 and VT240 don't support sixel scrolling, so they behave as if DECSDM
is permanently set, which means the cursor doesn't move. And from the screenshots I've seen of the VT382, I believe it behaves the same way as the VT340 when DECSDM
is reset, although it has a different cell size, and doesn't support color. But I'd be happy to learn otherwise if you have more information on the subject.
What about the specifications (manuals) from DEC?
The programming manuals don't go into that detail, and the latest revision of DEC STD 070 that is publicly available doesn't cover terminal cursor positioning - only the non-scrolling sixel terminals and sixel printers. But if you're aware of a more recent version, I'd love to have access to that.
What about corner cases not convered in the manuals?
Do you have something particular in mind? As mentioned above, the manuals don't cover details like that, but I'm not aware of any corners cases that haven't already been tested on the VT340.
What about xterm - and which version of xterm?
I'm assuming you're not expecting the standard sixel behavior to somehow match every different version of xterm, but I do know that xterm is at least trying to move closer to the standard. I haven't tested recently, though, so for all I know it may already match the standard behavior.
If all of these were consistent, I'd consider that as "standard sixel"
Well they're at least damn close to being consistent, and you certainly aren't going to find another "standard sixel" that does better than that.
I just don't get what you're trying to achieve by coming up with yet another cursor positioning algorithm that doesn't match any of the actual DEC terminals, and then also hiding your incompatibility from applications so they can't compensate for it. But I'm not going to interfere any more if you all really want to go ahead with this. I just find it baffling that anyone would think this is a good idea.
- Cursor down (Esc[1B) after a sixel image behaves the same as a newline except that the column is not reset to zero. Likely any terminal which has implemented Newline will already have Down working as well.
Just a point of correction here. CUD
won't trigger a scroll at the bottom of the page, but NEL
will (assuming that's what you mean by Newline). If you want to move down and scroll without changing the column offset, you can use IND
, LF
, VT
, or FF
(although the latter three would require that LNM
is reset).
I wouldn't mind if there were some way for the client to attach a hint indicating what kind of sixel regime it's expecting for a given image, possibly chosen from a set advertised by the terminal emulator. Ideally not just for cursor positioning, but cell size too. Then TEs could default to a cell size (e.g. 10x20) depending on their emulation mode, but allow printing of sixels emitted for a different resolution while maintaining a correct character extent by applying a scaling factor. This is clearly extra work on the TE side, but if you support changing the font size at runtime you're probably keeping track of image-to-cell scaling factors already.
Practical use cases for this come up now and then. I think someone in the MS Windows sixel support thread mentioned issues with sixel-based monitoring software still running in older power plants that had to be migrated from DEC to software TEs. So a "clean" break may be preferable. Plus it just makes sense for the image to be able to indicate its cell extent somehow.
A solution to this would have to come from TE maintainers though - I don't know how practical it'd be, or how much appetite there is for it.
Edit: I can assist with fast image scaling with no external deps, and the integration work for it, if needed.
@ all and some in particular: Idk why the discussions on these things always have to heat up. Its also tiresome to get the important bits filtered out of the rants, and it waters completely the goal to find a model, that works for both sides, current TEs with their technical limitations and app side for screen state control and convenience.
I wouldn't mind if there were some way for the client to attach a hint indicating what kind of sixel regime it's expecting for a given image, possibly chosen from a set advertised by the terminal emulator. Ideally not just for cursor positioning, but cell size too.
Interesting idea, but I think we're getting a bit far afield. If someone starts an issue to discuss this, I'd like to be included on it.
Practical use cases for this come up now and then. I think someone in the MS Windows sixel support thread mentioned issues with sixel-based monitoring software still running in older power plants that had to be migrated from DEC to software TEs.
Perhaps I'm misunderstanding here, but power plants should be fine. If they are presuming sixel graphics always extend the exact same number of character cells as a VT340 in 80 column mode, they can set their font to be a 10x20 bitmap. That already works great in Xterm.
> Then TEs could default to a cell size (e.g. 10x20) depending on their emulation mode, but allow printing of sixels emitted for a different resolution while maintaining a correct character extent by applying a scaling factor. This is clearly extra work on the TE side, but if you support changing the font size at runtime you're probably keeping track of image-to-cell scaling factors already.
Again, I'm likely misunderstanding, but I hope modern TEs do not try to decouple the resolution of sixel graphics from the resolution of the font. The behaviour of sixel graphics when the terminal changes font size is undefined, but if you want to copy the VT340, the graphics should simply be erased as that's what happens when switching to 132 column mode. App programmers (at least on UNIX) will get a WINCH signal and redraw the screen. Since programmers can already determine the number of pixels per character cell, there is no extra work TE devs need to do. Why make it any more complicated?
As for coming up with a way to allow for sixels emitted at different resolution, I will briefly mention that the sixel protocol actually already has support for that. In addition to pixel aspect ratio settings which you may know about, device independent scale is defined by the ANSI **SSU** escape sequence (select size units) and the grid parameter **Pn3**. Although only DEC's printers actually obeyed it, the VT340 always generates sixel images with the correct real-world scale embedded in them so that hardcopy on any device would be exactly the same size as the screen.
Click to view excerpt from DEC-070 section 7.8
## Level 2 Sixel Devices -
ASSUMPTIONS: Level 2 sixel devices support the Set Raster
Attribute command, Background Select, Horizontal Grid Size and
Macro Parameter commands.
Sixel control strings are sent as follows:
ESC P Ps1 ; Ps2 ; Pn3 q " Pn4 ; Pn5 ; Pn6 ; Pn7 ****** ESC \
\___/ \_______________/ \_____________________/ \____/ \___/
DCS Protocol Selector Raster Attributes Picture ST
data
\______________________/ \_____________________________/
DCS Introducer Sequence sixel data
Where:
* **Ps1** Is the Macro Parameter and is always ZERO.
* **Ps2** Background Select
* **1** if Background Printing is disabled in Set-Up
* **2** if Background Printing is enabled in Set-Up
* **Pn3** Horizontal Grid Size, given in units specified
by ANSI SSU (default is decipoints, 1/720 inch).
For default size units, the grid size should be
**6** for COMPRESSED and **9** for EXPANDED or ROTATED print.
_Since the host can change the printer between accesses,
SSU should be sent once before each sixel dump._
ESC [ 2 SP I (Set Size Unit
1/11 5/11 3/2 2/0 4/9 to Decipoints)
* **Pn4** Pixel aspect ratio numerator, 1
* **Pn5** Pixel aspect ratio denominator, 1
* **Pn6** Horizontal extent
(number of pixels in image horizontally)
* **Pn7** Vertical extent
(number of pixels in image vertically)
____
To everyone...
Just want to say I think digressions are okay (though, should be avoided when possible) but I've noticed and would like to encourage @hackerb9's style of puting them (and replies to them) within <details>
(with <summary>
) tags in order to ease the filtering of important bits.
Thank you all.
As I understand it, the controversial issue is: How should applications and terminals handle this common case: How to print an image (with no mixing of text and image on the same line), and then move the cursor to the first line below the image (that does not overlap any of the pixels)?
In "standard sixel" you can so this if the application knows how many sixel image lines there are per text line - the text height in "logical pixels". But this adds some complication and isn't necessarily portable. The "standard" has the awkward (somewhat useless) behavior that the text cursor is moved to match the line of the top sixel line of the last set of sixels, which means the application may need to send one or two newlines.
One idea: Define a "fresh-line" escape sequence (defined below). Then the application writes the image, followed by CR, then LF, then "fresh-line".
"Fresh-line" has no effect if the cursor is at the first column of an empty line, where "empty" means no characters and no overlap with a previous image. Otherwise, it is equivalent to CR+LF.
"Fresh-line" should be an escape sequence that behaves like CR+LF in terminals that are not aware of it. One possibility is:
\r\e[1;9D
The 9
is a modifier that tells the terminal to ignore the command if at the start of an empty line.
The fresh-line sequence is useful in other applications besides sixel output. For example a shell (or other REPL) can emit it before a prompt if it is unsure whether the previous command properly ended with a CR+LF sequence. (Though in this case the default is probably wrong: Assuming most commands correctly end with CR-LR, we want fresh-line to be ignored on terminals that don't recognize it. Perhaps we want to specify two alternate encodings for fresh-line: one that is ignored if not recognized; one that CR+LF if it is unrecognized. DomTerm implements \e[20u
for the former.)
That's an interesting proposal, though ideally I'd wish for something that can be used without returning to the first column, and which doesn't require a blank row. Chafa (by request) has a --relative
switch that's supposed to print output at the current cursor location, leaving it immediately below the bottom-left corner. This works for symbol graphics, but the promise is hard to uphold for other image protocols.
As I understand it, the controversial issue is: How should applications and terminals handle this common case: How to print an image (with no mixing of text and image on the same line), and then move the cursor to the first line below the image (that does not overlap any of the pixels)?
Fortunately, there is not much genuine controversy on that point: just send a single newline, '\n'
. Easypeasy.
The reason there seems to be controversy is that the vt340 can have an extremely rare quirk where the text will overwrite a few pixels at the bottom of the image. Depending upon your goal for the terminal emulator you are writing, this may or may not be important. It's not a major graphical problem and it almost never happens. Still a terminal which wants to be a faithful clone of the VT340's behaviours would of course care about this nuance. The cost of attempting to replicate it are high, causing other design trade-offs and adding complexity not just for the terminal developers but also for application programmers. Terminals which aim to be useful in modern times would be well advised to skip the quirk.
I believe it was not a design goal but a compromise. The VT340's "top pixel" heuristic is a quick approximation for a calculation that was too expensive at the time: "bottom pixel". Fortunately -- or perhaps by design -- the VT340's character cell height of 20 pixels makes that heuristic work just like "bottom pixel" in nearly every case.
I had thought this glitch was a bug in the VT340 but, after looking into it deeply, I am actually extremely impressed with the engineers from DEC. They came up with a clever solution that nobody noticed at the time was any different from the correct calculation. DEC's lack of documentation on this point would be surprising given how thorough the manuals are until one realizes it was probably omitted on purpose. If people knew the trick the VT340 was using, they might start relying on the quirky behaviour and future terminals would be obligated to support it.
Modern terminals have no need to approximate the calculation of the bottom-most opaque pixel as processors are not as limited as they were in the 1980s.
Even if they wanted to, there is no benefit to trying to extrapolate what the VT340 heuristic would be in modern times. Whatever it is, it is certainly not just picking the top pixel as that doesn't work for other character cell heights. Trying to salvage it by presuming all character cells are 10x20 regardless of the font size causes a cascade of other problems, the worst being that high res images require making the font size imperceptibly small.
And, even if some terminal did implement a heuristic that worked at any font size, it would be useless to application programmers. Calculating when to send two newlines is unnecessarily complicated and sometimes not even possible.
@PerBothner: Although not appropriate for sixel graphics, I could see your fresh-line proposal being useful for other situations, such as to make sure the prompt is located correctly after a program dies abruptly.
@hpjansson: To not return to the first column after displaying sixels, use IND, '\eD'
, instead of newline. If a terminal has newline working correctly, then IND should work, too.
@hpjansson: To not return to the first column after displaying sixels, use IND, '\eD', instead of newline. If you have newline working correctly, then IND should work, too.
Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"
Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"
Just emit IND once, same as a newline. This conserves DEC's sixel design.
Just emit IND once, same as a newline. This conserves DEC's sixel design.
Okay - I'll do that (and unless I've misunderstood something, accept that a few pixels may get cut off). I'll get out of your hair now so you can discuss the other aspects (e.g. should raster attributes define a clipping rectangle? :-) Enjoying the conversation.
Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?"
maybe i'm misunderstanding the need, but in notcurses i handle what i believe to be your problem by getting the terminal size in pixels, dividing that out by the number of rows and cols, and using those as the cell pixel dimensions. doesn't this provide you enough?
should raster attributes define a clipping rectangle?
Good question. I've already said I think it's a reasonable, if not ideal, optimization even though it clearly violates both DEC's documentation and actual hardware behaviour.
I should ask, though, does anyone have a good hypothesis for why DEC repeatedly stated that sixel images can extend beyond the rectangle defined by RA? What is lost by taking this optimization?
@jerch, when you say you get a 40% speed boost, what exactly was the bottleneck? Memory pressure from dynamic allocation of large rectangles?
@hackerb9 how does trailing GNLs interact with last transparent rows being clipped? One way of looking at final, trailing GNL, is that it is a completely transparent sixel row (and thus that it should be removed). But perhaps it's more correct to say that a GNL should be treated as a fully opaque row, until you start printing sixels; then you start tracking the bottom-most opaque pixel.
when you say you get a 40% speed boost, what exactly was the bottleneck? Memory pressure from dynamic allocation of large rectangles?
I can obviously not speak for @jerch , and I, too, am very curious. However, for me, there's no 40% speed boost just from allowing the raster attributes to act like a clipping region. Foot allocates the entire backing memory when the raster attributes is set. We still have to check for "overflows" (either increase image size if the sixel cursor goes beyond the raster attributes, or ignore the sixel). Thus, it makes very little difference while processing sixel characters.
There would be a small performance gain, in that we wouldn't have to reallocate the backing image when we encounter "sloppy" encoders that emit a trailing GNL, that triggers a vertical resize.
Treating it as a clipping region does simplify things though. And, almost removes the need to scan for last-opaque sixel row ;)
But, I'm fine with either way.
does anyone have a good hypothesis for why DEC repeatedly stated that sixel images can extend beyond the rectangle defined by RA? What is lost by taking this optimization?
Infinite scroll would be the most obvious example (I'm sure we discussed this somewhere before but I can't find it in your repo right now). You'd also lose some bandwidth saving tricks that could be beneficial when working with non-rectangular output. You can see the sort of thing I mean in the raster dimension tests.
@hackerb9 how does trailing GNLs interact with last transparent rows being clipped?
Before I get into the weeds about a trailing graphic newline, I do want to say that I think GNL is not as important as getting the text newline behaviour consistent across modern terminals.
One way of looking at final, trailing GNL, is that it is a completely transparent sixel row (and thus that it should be removed). But perhaps it's more correct to say that a GNL should be treated as a fully opaque row, until you start printing sixels; then you start tracking the bottom-most opaque pixel.
@j4james is most knowledgeable of precise VT340 behaviour and may even know the exact algorithm for 20 pixel tall fonts off-hand.
For modern terminals, I think perhaps a better question would be _why_ did DEC choose the algorithm they did for the VT340? We've already seen that sometimes they developed fast but inexact algorithms to overcome hardware limitations, so what benefits did the algorithm they chose for the VT340's GNL provide to programmers and users at that time?
With the caveat that I haven't thought this out as deeply as I have text newlines, here's my current take on GNL:
**EFFECT OF A TRAILING GRAPHIC NEW LINE ON TEXT CURSOR POSITION**
Previous image height | Behaviour
|-|-
|Exact multiple of text height | Cursor is moved to the blank line under the image
| Anything else | Has no effect (usually)
It seems that a trailing GNL is practically useless to current programmers as the following text will [almost always overlap](https://github.com/hackerb9/vt340test/blob/main/sixeltests/trailing-gnl.sh). The one case it is sure to give a fresh line is not terribly useful since a text newline works the same and is more general.
I don't know the design parameters DEC was constrained by, but it looks an awful lot like an attempt at backwards compatibility. Historically, sixels were designed for printers and [teletypewriters](https://github.com/hackerb9/vt340test/raw/main/docs/kindred/EK-LA34S-TM-001_DECwriter_IV_Series_Technical_Manual.pdf) in which GNL represented advancing the paper by a fraction of the usual line height.
**Excerpt from [LJ250](https://github.com/hackerb9/vt340test/blob/main/docs/kindred/EK-LJ250-RM_LJ250_Printer_256-color_Sixel_Sep87.pdf) Printer Programmer's Reference Manual**
> **6.3.2.4 Graphic New Line (`-`)** The graphic new line (GNL) control code (2/13) sets the active column to the [graphic] left margin and advances the paper by the current sixel height.
Since the fractions can add up, it makes sense that some programmers may have relied on printing images at a multiple of the line height and sending a final GNL to move the printhead to the next (whole) text line instead of using an explicit **LF**. Perhaps this was a common programming idiom and DEC wanted to make sure it still worked on video terminals.
One problem with this theory is that printer-terminals, unlike video-terminals, might have been able to print a fraction of a line down so sizing images to a multiple of the line-height might not matter. Response: It's also possible that being aligned to whole lines was important if not 100% necessary. For example, the manual for the DEC LA100 printer has a caution about using Partial Line Down:
> The PLD sequence does not modify the active line. To avoid losing the top of form reference send an equal number of PLU sequences to the terminal.
Another possible reason aligning to whole lines may have been important back in those days was that [green bar paper](https://www.pdp8online.com/images/greenbar.shtml) was common, but that seems weak to me.
A possible critique and response
Even if my above theory is correct, one thing I don't get is why not always advance the text cursor? What, if any, benefit is there to have a trailing GNL stay on the same line?
Alright, I now have three open PRs for foot, addressing the following:
Place cursor on the last character row touched by the sixel: this is the one I started out this ticket with.
limit image size to the one specified in the raster attributes: changes foot from allowing images to grow beyond the dimensions in the raster attributes. This is mostly to sync with @jerch. In the end, it didn't really offer any major benefits, and I would be just as happy to continue supporting dynamically growing the image beyond the raster attributes' dimensions. If I were to decide all on my own, I wouldn't merge this PR, and instead continue supporting dynamic resizes. Note that even with this PR, dynamically sized images are still supported, as long as they omit the raster attributes.
trim trailing, fully transparent sixel rows: does what it says. We haven't really discussed the nitty gritty details on this one, so, I chose to do this for all sixels, regardless of the background color mode (i.e. the P2 parameter), and regardless of whether there are any raster attributes present or not.
Is this something you all (though I guess it's pretty clear where @j4james stands on this) would consider implementing in your TEs?
Just to make it clear. I don't intend to merge any of the above (1640 being the exception) unless we can reach at least some level of consensus here.
@hackerb9 thanks again for your detailed explanation. What I ended up doing (in 1640), is to let trailing GNLs move the text cursor as if you had at least one fully opaque 6-pixel sixel on that row, but as soon as you start printing sixels, I switch to tracking whatever the actual bottom pixel is. In other words, a trailing GNL will not be trimmed out when we remove trailing, transparent sixel rows.
@dankamongmen
maybe i'm misunderstanding the need, but in notcurses i handle what i believe to be your problem by getting the terminal size in pixels, dividing that out by the number of rows and cols, and using those as the cell pixel dimensions. doesn't this provide you enough?
@dnkl
Considering images actually having trailing transparent rows, I have a couple concerns/questions as regards trimming trailing transparent rows:
P2=0
over another?
Sixel capable terminal emulators have gotten cursor placement (after emitting the sixel) wrong since the beginning. They usually put the cursor on a new line under the sixel. This means the terminal content may scroll, if a sixel is printed on the last row.
However, it's not how the VT340 did it. The simplified explanation is that it places the cursor on the last line of the sixel. Thus, if you want to print text under the sixel, you first have to print a newline.
The real algorithm is slightly more complex than that. A sixel is 6 pixels tall. This means it can cover two text rows. The DEC cursor placement algorithm puts the text cursor where the top pixel is. This means there are times when two newlines are required to print text under the sixel.
A number of terminals have started to implement the correct behavior. Terminals that implement the DEC placement algorithm are foot, contour, DomTerm and WezTerm. There may be more that I'm not aware of. XTerm is close to correct, but last time I checked, it placed the cursor on the bottom pixel (i.e. you always need a single newline).
Right now, running
chafa <image> && echo "XXXXXX"
will look something like this in e.g. foot:(picture shows a part of my dog's paw...)
A bit more information here: