hackerb9 / vt340test

Tests of VT340 compatibility
Creative Commons Zero v1.0 Universal
37 stars 5 forks source link

Bill The Cat #30

Open j4james opened 1 year ago

j4james commented 1 year ago

I noticed that you recently added another version of the "Bill The Cat" image (ef477b215253fa293e10598f5c43888f6674b684), so I thought it's probably worth discussing. Because you've now got three different versions in this repo (plus an additional two duplicates), and every one of them is incorrect!

The oldest instance I've found was posted to the comp.os.vms newgroup back in 1989. It's possibly not the original source, but it's the only "correct" version I've seen. There's a copy of the message archived in Google Groups here: https://groups.google.com/g/comp.os.vms/c/7ns4E74ahK8/m/RUeKjxaGR9gJ

Note that it's in two parts: a sixel file with no palette definition, and a ReGIS file which defines the palette entries. In the sixel part, there are three colors in the order #1, #2, and #3, and since the palette isn't included, those map directly to color table entries 1, 2, and 3 (color 0 isn't referenced directly, but that is the background color). The actual palette values are defined in the ReGIS file as 0 = green, 1 = black, 2 = red, and 3 = white.

So that gives you an image made up of black, red, and white pixels, on a green background. You can see below what it should look like (more or less). The mix of red and green render the cat in a shade of orange, and you can clearly see the black whiskers against the green background (which isn't the case in most other variants).

image

Now lets look at the versions that you've got in this repo:

The bottom line is that if you want to get this working correctly in a single sixel image on the VT340, you need a sixel palette definition that covers all sixteen entries, so you can set the last one (the background) to green.

But the easiest solution would be to use the ReGIS palette definition from the original. That should (I think) work on both the VT240 and VT340. I'd completely forgotten about that as an option when we were discussing portability in https://github.com/jerch/xterm-addon-image/pull/41#issuecomment-1380192461.

hackerb9 commented 1 year ago

Thank you for looking into the proliferation of Bills.

Ah! A green background makes much more sense with the odd green outline I was seeing.

Version 1 I got from the MS Kermit disks. Version 2 was my attempt to fix the background with transparency. Version 3 is just a test image I use when I want to show sixels but not mess with the current colormap.

hackerb9 commented 1 year ago

By the way, I've been thinking about ways I can embed a short comment in files with ReGIS/sixel. Have you got any suggestions? I suppose I could use OSC 8 to provide a link to a URI, but I'd prefer something that allowed an arbitrary string.

j4james commented 1 year ago

Version 1 I got from the MS Kermit disks. Version 2 was my attempt to fix the background with transparency. Version 3 is just a test image I use when I want to show sixels but not mess with the current colormap.

OK. That all makes sense. But if I were you'd I'd just keep the Kermit version that you already have in the kermitdemos directory as an historical reference, and keep one instance of the palette-free version for testing with, but then maybe dump the others in favor of a VT340-compatible version with the background color fixed.

I think a palette like the one below should work on both the VT340 and VT240. The first four entries define the colors as expected for the VT240, entries 5 to 15 are just the default palette values for a VT340, and the last entry is a repeat of green again for the VT340 background color.

#1;1;0;0;0
#2;1;120;50;100
#3;1;0;99;0
#4;1;280;35;60
#5;1;300;49;59
#6;1;180;49;59
#7;1;0;46;0
#8;1;0;26;0
#9;1;0;46;28
#10;1;120;42;38
#11;1;240;46;28
#12;1;60;46;28
#13;1;300;46;28
#14;1;180;46;28
#15;1;0;79;0
#0;1;280;35;60

By the way, I've been thinking about ways I can embed a short comment in files with ReGIS/sixel. Have you got any suggestions?

That's actually something I've tried to do in the past. I thought it'd be clever to use an SOS (Start Of String) sequence to embed comment strings, but I soon discovered that most terminals don't handle that correctly, and just print the content out to the screen.

You might have better luck with a custom OSC sequence, but even that's not guaranteed to work. In fact the DEC terminals are particularly bad for stuff like this. The only string sequences that work on a VT240 are the DCS sequences, and nothing worked on the VT10x devices.

Actually now I'm curious how well your VT340 copes with string sequences other than DCS, but I wouldn't be surprised if it's just as bad as the VT240.

Edit: I've since checked my notes, and it looks like the VT320 and later terminals should all support OSC, PM, and APC, but there is no mention of SOS until the VT382.

j4james commented 1 year ago

I just had another horrible idea regarding comments in Sixel. Any bytes that aren't valid sixel values will be ignored, so if you're happy to embed your comments inside the DCS data sequence, you can use Unicode characters that look like ASCII, but obviously aren't in the ASCII range. The trick is that the UTF-8 encoding must also not use bytes from the C1 range.

The first example I found that worked, was the fullwidth capital letter set from U+FF21 to U+FF3A. For example COMMENT. I've tested this on the VT240, and it seemed to worked perfectly. I wouldn't necessarily expect it to work on all modern terminal emulators, though.

hackerb9 commented 1 year ago

I just had another horrible idea regarding comments in Sixel.

😆 Wow, I can't believe that actually works. I just tested it on the VT340 and it does render correctly. Or rather, the graphics render correctly, but of course the comment text looks like

 ABCDEFGHIJKLMNOPQRSTUVWXYZ.

As cool as that is, I think it might be better to stick to ASCII, which, as you implied, means it has to be outside the sixel bitmap DCS sequence.

How about this for an idea: stick the comments in another DCS sequence. We've already seen that sixel images can be made up of multiple device control strings, as in this cat example. Please correct me if I'm wrong, but I believe using a previously unknown DCS sequence would work everywhere -- from an VT125 to the latest terminal emulator -- because the correct thing to do with an unknown DCS sequence is to ignore it.

The VT125 also supports ESC P, which is DCS and ESC \, which is ST. (Refer to Communication and Graphic Protocol Controls in the VT125 User Guide.) The intermediate and final characters are taken together to define the function of the sequence. Then the device performs the action and accepts more data. If the action defined by the escape sequence does not apply to the device, the device ignores the complete sequence and accepts more data.

@j4james, do you know of any sixel capable devices that cannot handle unknown device control strings?

There are over a thousand possible DCS strings.
    It looks like ANSI's rules for DCS strings were pretty lax, so DEC added some structure by adding the notion of Parameters, Intermediates, and a Final, just like normal control sequences. | DCS | Parameters (optional) | Intermediates (optional) | Final | Data String | ST | |-------|----------------------------------|-----------------------------------|-------------|-------------------------------|--------| | 0x90 | 0x30..0x3F | 0x20..0x2F | 0x40..0x7E | Arbitrary data separated by ; | 0x9C | | Esc P | 0 1 2 3 4 5 6 7 8 9 : ; \< = > ? | Sp ! " # $ % & ' ( ) \* + , - . / | @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \\ ] ^ \_ \` a b c d e f g h i j k l m n o p q r s t u v w x y z { \| } ~ | \*\*\*\*\*\*\*\* | Esc \\ | Since the Final comes from the set 0x40..0x7E (`@` to `~`), there are 63 completely distinct DCS strings. And, in fact, DEC uses the Intermediate to further distinguish the DCS strings. There are 17 different Intermediates 0x20..0x2F (` ` to `/` plus none at all), which multiplies out to over a thousand different DCS strings.[^1]
I don't know how many DCS strings DEC defined, but it was not close to a thousand.
    I've started doing some research on which DCS strings are already assigned, and so far I've seen this: | Mnemonic | Intermediate | Final | Meaning | |----------|-------------:|-------|--------------------------------------------------| | | | p | ReGIS graphics | | DECRSTS | $ | p | Restore Terminal State | | | | q | Sixel graphics | | DECRQSS | $ | q | Request Selection or Setting | | DECLBAN | | r | Load Banner Message | | DECRPSS | $ | r | Report Selection or Setting | | DECTSR | $ | s | Response to Terminal State Report | | | | t | VT105 waveform graphics (for VT125) | | DECRSPS | $ | t | Restore Presentation State | | DECAUPSS | ! | u | Assign User Preferred Supplemental Character Set | | DECCIR | 1$ | u | Cursor Information Report | | DECTABSR | 2$ | u | Tabulation Stop Report | | DECLANS | | v | Load Answerback Message | | DECPFK | " | x | Program Function Key | | DECPAK | " | y | Program Alphanumeric Key | | DECDMAC | ! | z | Define Macro | | DECCKD | " | z | Copy Key Default | | DECRPTUI | ! | \| | Report Terminal Unit ID (DA3) | | DECUDK | | \| | Download User Defined Keys | | DECSTUI | ! | { | Set Terminal Unit ID | | DECDLD | | { | Down-line Load Font | | DECRPFK | " | } | Report Function Key Definition | | DECRPAK | " | ~ | Report All Modifers/Key State | | DECCKSR | ! | ~ | Response to Checksum Request |

I suggest using DCS / \ because the backslash visually echoes the comment terminator ST

    | DCS | Parameters (optional) | Intermediates | Final | Data String | ST | |-------|----------------------------------|-----------------------------------|-------------|-----------------------|------| | 0x90 | 0x30..0x3F | 0x2F | 0x5C | 0x9..0xD, 0x20..0x7E | 0x9C | | Esc P | | / | \\ | Comment: This is a picture of a cat | Esc \\ |

(I'll post an example when I get a chance.)

[^1]: Technically, there can be multiple Intermediates which gives an unlimited number of possible DCS strings. However, since I've never seen DEC actually do that, I wouldn't be surprised if terminals took the short cut of expecting at most one.

j4james commented 1 year ago

do you know of any sixel capable devices that cannot handle unknown device control strings?

I'm almost certain there is at least one DEC terminal that supported some DCS operations, but other DCS sequences that it didn't recognise would have their data string output to the screen. I've checked the VT240, though, and that seems fine, so I'm not sure where I had that issue. I'll have to do some more digging.

Since the Final comes from the set 0x40..0x7E (@ to ~), there are 63 completely distinct DCS strings.

According to DEC STD 070, the DCS Final should really be in the range 0x70..0x7E, same as for private control sequences (see section 3.5.4.1). I think one of the other DEC manuals may have given the range as 0x40..0x7E, but I suspect that was a mistake. I'm fairly certain that all the DEC DCS sequences use a final from the private range.

With multiple intermediates, you'd still have thousands of options, but as you say, some terminals are not likely to cope with that. In fact, I think you'll find a number of terminals won't even cope with a single intermediate. I know that was a problem for some of the early DEC terminals too (at least for CSI sequences).

This is one of the reasons why I try to avoid inventing new sequences. It's almost impossible to come up with something new without breaking somebody's terminal.

I do like the your idea of a / intermediate, though. Maybe even go with // if it's no worse than a single intermediate in terms of breaking things. But you'll still need a final char from the private range - maybe something like ~.

hackerb9 commented 1 year ago

According to DEC STD 070, the DCS Final should really be in the range 0x70..0x7E, same as for private control sequences (see section 3.5.4.1).

I had noticed that DEC Finals started at the letter p. My guess was that they saw 0x70..E as belonging to DEC and were trying to encourage standardization of their extension to ANSI's DCS by leaving plenty of space for other vendors.

I was basing the range 0x40 to 0x7E on the VT520 Reference Manual but the VT340 Text Programming Manual states the same thing:

Device control strings (DCS), like control sequences, use two or more bytes to
define specific control functions. However, a DCS also includes a data string. ... F is the final character in the 4/0 to 7/14 range.

Is it possible DEC 070's recommendation to only use 0x70 to 0x7E was based on the presumption that only DEC employees would be reading it?

j4james commented 1 year ago

My guess was that they saw 0x70..E as belonging to DEC and were trying to encourage standardization of their extension to ANSI's DCS by leaving plenty of space for other vendors.

I think it's more likely they were trying to leave the 0x40..0x6F range as reserved for ANSI standardization, and it's expected that all vendors would share the 0x70..0x7E finals for their private sequences. This is the exact same pattern that ANSI defined for the CSI sequences.

And note that the VT340 and VT520 manuals also list the CSI final range as 0x40..0x7E, but DEC only uses 0x70..0x7E (as required by ANSI).

Is it possible DEC 070's recommendation to only use 0x70 to 0x7E was based on the presumption that only DEC employees would be reading it?

That makes sense. One way of looking at it is that the VT340 and VT520 manuals are telling users what finals they can expect to encounter in general, but the STD 070 manual is telling DEC terminal developers what range they should restrict themselves to so as to avoid conflict with future ANSI standards.

do you know of any sixel capable devices that cannot handle unknown device control strings?

Btw, I eventually found out which terminal I was thinking of - it's the VK100. It supports ReGIS, so a DCS p sequence is parsed correctly, but any other DCS sequences are just output to the screen. And although it doesn't support sixel, I wouldn't be surprised if the VT125 had a similar limitation.

hackerb9 commented 1 year ago

I think it's more likely they were trying to leave the 0x40..0x6F range as reserved for ANSI standardization, and it's expected that all vendors would share the 0x70..0x7E finals for their private sequences. This is the exact same pattern that ANSI defined for the CSI sequences.

See, this is why I'm glad you're helping me figure this out. I had no idea about the ANSI requirement that vendors stick to 0x70..0x7E for CSI. Now that I know that, I think your suggestion of //~ is probably the best sequence for a comment. It follows the guidelines to allow for future ANSI standardization, it uses two intermediates to reduce the chance of collisions, and it is readable as part of a textual comment.

Btw, I eventually found out which terminal I was thinking of - it's the VK100. It supports ReGIS, so a DCS p sequence is parsed correctly, but any other DCS sequences are just output to the screen.

You have a VK100 GIGI? Those sound pretty neat, but the closest I've ever come to one is when I noticed that someone was trying to sell an empty GIGI box on eBay for way too much money.

GIGI

And although it doesn't support sixel, I wouldn't be surprised if the VT125 had a similar limitation.

I would have thought so, too, but I was reading the VT125 User Guide recently for my font upline-loading program. In the section on DCS it says, "if the [...] action does not apply to the device, [...] the device ignores the escape sequence entirely".

Interestingly, it also seems to say that the VT125 can display sixel bitmaps. Although, of course, it doesn't name them as such. Instead it refers to the "DECwriter graphics protocol", "DECwriter control commands", "DECwriter descriptor data", "DECwriter graphics hardcopy descriptor", or "Hardcopy image data".

Excerpt from User Guide showing VT125 sixel support

ESC P Pn q —- Delimit image format.  Accept the text that follows from the same data path as this sequence as DECwriter graphics hard copy descriptor and display it. Pn is ignored. Refer to the Media Copy control function description for information about generating DECwriter descriptor.

Table from Appendix F.3 comparing VT125 to VK100

VT125 VK100
Hardcopy output can be directed to the host port using the "Media Copy" control sequence (ESC [ ? 2 i) . Hardcopy output can be directed to the host port by swapping the cables.
Hardcopy output is enclosed in a single "DCS q"/"ST" sequence, using the DECwriter control commands. Hardcopy output is enclosed in "DCS q" on a line-by-line basis.
Hardcopy image data sent to the VT125 will be displayed as if the VT125 was a hardcopy device. Hardcopy data sent to the VK100 is ignored
j4james commented 1 year ago

You have a VK100 GIGI?

Sadly not, but MAME has a partially working version that I've been experimenting with. The serial port doesn't seem to work, but you can still play around in local mode to see what sequences are supported and how they behave. It's got some really neat features.

the VT125 can display sixel bitmaps.

Sorry, I wasn't very clear, but what I was trying to say was that the VK100 doesn't support Sixel, so it didn't meet the requirements of your original question, i.e. a sixel capable device that couldn't handle an unknown DCS.

"if the [...] action does not apply to the device, [...] the device ignores the escape sequence entirely".

Well that's a good sign. So then maybe it can be considered safe to use an unknown DCS on all the DEC sixel devices. Just not all the ReGIS devices.

Hardcopy output can be directed to the host port by swapping the cables.

It's a shame the VK100 couldn't also force the printer redirection with a media copy sequence though. Because it has a regular escape sequence for triggering a hard copy (DECHCP), so you don't need use ReGIS like you do on the VT240 and VT340. It would have been nice to have at least one DEC terminal that had both printer redirection, and a simple hardcopy sequence.