Open j4james opened 1 year ago
By the way, if you want to reproduce my symbol test pattern on your VT340, this is the script I used: https://gist.github.com/j4james/0983acc4f2d5286240100182736f161c
Although be aware that I tweak the color table somewhat to try and make the contrast clearer on the checkerboard patten. I wanted to make it easy to see where the divisions were between the individual characters.
Thanks! That's interesting that people who tried to get the characters into Unicode may not have known what they meant. I wonder if, now that there's a huge area above the BMP open for miscellaneous symbols (and emoji), if the Unicode Consortium would be more willing to reconsider adding in the proper summation characters.
Thanks also for the test pattern. Is that from a genuine or emulated VT340? The "grottiness" of the summation I mentioned earlier referred to the overly pixelated diagonal lines (compared to any other TCS symbol).
How did you create the test pattern? I had been using a text editor after sending a locking shift on the command line, but that doesn't work when trying to show characters from more than one character set, as one would in a real mathematical equation. Do you have any notion of how people back in the day created files with embedded escapes (ISO 2022)? Did their text editors just pass embedded escapes directly to the terminal?
I wonder if, now that there's a huge area above the BMP open for miscellaneous symbols (and emoji), if the Unicode Consortium would be more willing to reconsider adding in the proper summation characters.
Maybe, but I don't think they really like the idea of component characters like this, unless there's a use case for something like data interchange, i.e. you've got documents stored in the DEC Technical character set that you want to convert to Unicode, and I don't think there's evidence of that. Frankly I'm surprised they even allowed the existing Math component characters.
I wouldn't say it's out of the question, but it'd probably require a lot of effort to work through the standard process. The Terminal Graphics proposal went through something like five different drafts, and took a year and half to make it into the standard. Working with standards organizations can be a painful, thankless task, and the results are often disappointing.
Is that from a genuine or emulated VT340? The "grottiness" of the summation I mentioned earlier referred to the overly pixelated diagonal lines
I was using Windows Terminal, which doesn't actually support the TCS set, but I generated an equivalent soft font from your VT340 screenshot. Actually that's possibly something worth including in the repo here, because it's a nice way to add TCS support to terminals that don't have that charset, but which do support soft fonts (see dectech.fnt).
And I saw what you meant about the "grottiness" when I was playing around with the font in my font editor. I was actually somewhat tempted to try smoothing those characters, and also fix some other minor issues, but in the end I thought it best to make it exactly match the VT340 for now, and maybe work on a higher resolution version at a later point in time.
How did you create the test pattern?
Not sure why I didn't just give you the original python script I used to start with, but here it is: dectsc.py. Initially I was just loading the TCS set into G0 with an SCS sequence, writing out a chuck of hardcoded text, and then loading the ASCII set back afterwards (I realise now I'm just assuming G0 is mapped to GL, but that is typically the case).
It became a bit more complicated once I decided to add a couple of ASCII characters to the pattern (( ) [ ] { }
), so I have to switch back to ASCII temporarily at one point, and it's a bit hacky.
Do you have any notion of how people back in the day created files with embedded escapes (ISO 2022)?
I'm not sure actually. You might find some examples in the DECUS archives, but I suspect a lot of the legacy software from that era would be proprietary commercial stuff so you may not find a lot of open source.
Edit: I should add that the ANSI standard originally intended the escape sequences to be usable by word processors, so in theory you could have a document format that could be dumped directly to a terminal or printer. Even if the document has an additional encapsulating format, using ANSI escape sequences for the basic markup would make it easier to render and print.
I'm not sure to what extent anyone took advantage of that though. I know there was the Open Document Architecture (ODA) standard, which uses ISO-6429/ECMA-48/ANSI-X3.64 internally, but apparently that wasn't widely adopted. It's probably most notable nowadays for inspiring the 24-bit SGR color sequence that many modern terminals support in some form or another.
I wonder if, now that there's a huge area above the BMP open for miscellaneous symbols (and emoji), if the Unicode Consortium would be more willing to reconsider adding in the proper summation characters.
Maybe, but I don't think they really like the idea of component characters like this, unless there's a use case for something like data interchange, i.e. you've got documents stored in the DEC Technical character set that you want to convert to Unicode, and I don't think there's evidence of that.
I bet one could find some PhD theses that were printed using TCS, but electronic documents, yeah, those as scarce as hen's teeth. And, I'll be honest, I only would want the DEC summation sign in Unicode because character cell terminals are fun, not because there is any practical use. I think the Unicode Consortium was probably right to balk at adding the characters if the claim was that it would help with information interchange. If the DEC summation components ever do make it into Unicode, I think it will be based on the same reasoning as Klingon and most Emojis: the Unicode Consortium found the idea amusing.
Edit: I should add that the ANSI standard originally intended the escape sequences to be usable by word processors, so in theory you could have a document format that could be dumped directly to a terminal or printer. Even if the document has an additional encapsulating format, using ANSI escape sequences for the basic markup would make it easier to render and print.
I didn't know that, but that fits well with DEC's claims of "ANSI compatible printing" and the documentation I've seen of printers that could swap character sets, just like a terminal.
As I think about this, I realize I may have been asking the wrong question. Back then, it might not have been a matter of finding a text editor that "allows" embedding escape sequences as much as whether editors prevented it like they do nowadays. After running,
stty -ctlecho
so that escape sequences I typed wouldn't be shown using caret notation, I was able to use ed
to embed (and see) escape sequences and it worked better than expected. (Admittedly, a pretty low bar.)
I just wanted to note here that the upcoming version 16 of Unicode includes a couple more characters that could be useful in mapping the DEC Technical character set.
U+1FBDB
(BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO MIDDLE CENTRE TO LOWER LEFT). Although this glyph was introduced to support a character set from the Ohio Scientific Superboard II computer, it looks like exactly what we need for the missing right middle summation glyph (03/07).
U+1CC1B
(BOX DRAWINGS LIGHT HORIZONTAL AND UPPER RIGHT). This was introduced to support the European alternate character set for the Sharp MZ-700 computer, but it's fairly close to what we need for the bottom right summation glyph (03/06). The right hand side goes all the way to the top corner, whereas the DEC glyph only goes part of the way, but it's orders of magnitude better than the current U+230B
mapping.
U+1CC1C
(BOX DRAWINGS LIGHT HORIZONTAL AND LOWER RIGHT). As above, this was intended for the Sharp MZ-700, but is fairly close to what we need for the top right summation glyph (03/05). Again, while not perfect, it's orders of magnitude better than the current U+2309
mapping.
I didn't see anything that improves on the existing top left and bottom left summation glyphs, and we still have issues with the brackets/parenthesis not aligning with the vertical connector, but we at least now have Unicode code points that vaguely resemble all the required glyphs. And considering how much junk they've been willing to add so far, maybe there is still a chance we'll get some dedicated code points for these glyphs one day.
Awesome research, James. It constantly tickles me that the Unicode consortium, in attempting to put their foot down and say, “Unicode is not for that”, has only made people resort to homoglyphs to get their meaning across. (Mathematicians and YouTubers will write things like 10⁴⁰⸴⁰⁰⁰, superscript comma or no.)
By the way, @j4james, I finally got around to adding your downlineloadable soft font for TCS that you based on my VT340 screenshots. Since you seem pretty handy at manipulating fonts from abstruse formats into something usable in modern times, you may be interested to check out the bitmap font that DEC included on a VMS "freeware" CD and suggested for use with DECTerm, their VT340-ish terminal emulator. It includes TCS in double-wide and double-high as well as "narrow" and "wide" (not sure what those mean) plus bold variants of everything: vwsvt0
I don't know if Unicode cares, but it does seem that the missing TCS characters were used by a number of programs for both display and printing in the early to mid 1980s. There was a program called MEC MASS-11 that was reviewed in the Notices of the American Mathematical Society (“Mass-11 is for the person who wants a WYSIWYG processor for equations ... Large math symbols are built up out of smaller graphics pieces.”) . Mass-11 could run on VAX/VMS, IBM PCs, and the DEC Rainbow.
There was another product called Spellbinder Scientific which the newsletter of the Lawrence-Berkeley National Laboratory gave rave reviews. The American Institute of Physics featured a picture of Spellbinder Scientific's TCS abilities in their inaugural issue of Computers in Physics.
you may be interested to check out the bitmap font that DEC included on a VMS "freeware" CD and suggested for use with DECTerm
Yes, I'm very much interested in that. I'm working on a soft font editor at the moment, and one of the features I was considering was the ability to import other bitmap font formats. I haven't looked at the pcf file format yet, so I don't know how difficult that would be to support, but I'm keen to give it a try at some point.
The American Institute of Physics featured a picture of Spellbinder Scientific's TCS abilities in their inaugural issue of Computers in Physics.
If anyone does want to submit a proposal to Unicode for the addition of the necessary DEC TCS characters, I think references like this can definitely help. It's actually worth having a look at the proposal for the latest set of legacy terminal glyphs that were added, because you can see the kind of thing they're expecting.
https://www.unicode.org/L2/L2021/21235r-terminals-supplement-noattach.pdf
You'll also notice that they included quite a large number of component characters in that proposal - things like chess pieces divided up into four quarters, and game sprites split into top and bottom parts - so it doesn't seem like the Unicode Consortium has a problem with that anymore.
But considering this was the work of the "Terminals Working Group", it's a little disappointing that nobody thought to suggest the DEC TCS characters, especially given all of the other obscure hardware they did include. But I suppose I can't complain when I haven't been bothered to make the suggestion myself.
A soft font editor that runs on sixel terminals? I had actually just been looking around for something like that! Apparently the GIGI/VK100 had one included (for ReGIS).
Perhaps the Terminals Working Group just didn't know that TCS support is still lacking since there are glyphs that look vaguely similar. I'm in the same non-complaining boat with you, but I'll do what I can to document the need. The TCS page by Paul Flo Williams is a good start, though it appears to have been written before the first proposal by Frank da Cruz.
A soft font editor that runs on sixel terminals?
I'm afraid not. It can edit soft fonts from any of the DEC terminals, but it'll only run on a VT525. It's heavily dependent on level 4 functionality like macros and rectangular area operations, and it also requires color. It's possible I might be able to adapt it to work on monochrome terminals like the VT510 and VT420, but the VT340 would be a step too far. It was originally only intended for personal use.
If you don't mind sharing, I'd love to see your notes on each terminal and its unique font quirks. I'm not going to even attempt to do that, but I'm curious.
Maybe you will inspire me to write up my own, more limited version. I could make it doable by focusing on the VT3xx and VT2xx. I'd only handle frills if they are easy (rectangular pixel aspect ratio for 5x10, 6x10, 7x10; making a matching 132-column font) and skip things that might be tricky (fonts larger than the terminal's character cell size; specialized fonts for different Psgr values; handling non-DRCS fonts, like ReGIS and ROM).
I've got a script that massages the sixels that exist in soft fonts to sixel bitmaps for viewing on any VT340 emulator. Maybe I could quickly magnify the current character by setting a long aspect ratio and inserting !
to repeat each sixel on the screen. (In later terminals did DEC ever allow soft fonts to have repeats in them?) Modifying a bit in a sixel is just a boolean operation on the byte value, so I could... Oh, heck, seems like you have already inspired me. ☺
In addition to different terminals having different cell sizes, there are multiple variants for each terminal - usually 4, but potentially up to 12 (6 screen sizes, and full-cell and text-font variants for each of those). However, those variants should all have the same pixel aspect ratio, so that's at least easier to deal with.
80x24 Full |
Text |
132x24 Full |
Text |
80x36 Full |
Text |
132x36 Full |
Text |
80x48 Full |
Text |
132x48 Full |
Text |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
VT2x0 | 7x10 | 6x10 | 5x10 | |||||||||
VT320 | 15x12 | 12x12 | 9x12 | 7x12 | ||||||||
VT340 | 10x20 | 8x20 | 6x20 | 5x20 | ||||||||
VT382 | 12x30 | 10x30 | 7x30 | 6x30 | ||||||||
VT420+ | 10x16 | 8x16 | 6x16 | 5x16 | 10x10 | 8x10 | 6x10 | 5x10 | 10x8 | 8x8 | 6x8 | 5x8 |
The VT2x0 devices are a bit different, in that they don't include the height (Pcmh) and screen size (Pcss) parameters. The width parameter (Pcmw/Pcms) is a kind of index that covers width, height, and screen size: 2 = 5x10 (132-column text font), 3 = 6x10 (132-column full-cell font), and 4 = 7x10 (80-column text font).
The VT2x0 devices also don't officially support full-cell fonts in 80-column mode, but if you include a pixel in the 8th column of a 7x10 font, that would apparently be duplicated across the padding columns to cover the full cell. That provided a way to generate simple full-cell glyphs like blocks and box characters. However, that only worked on the VT220 - the VT240 just treated the 8th column the same as any other (at least on MAME). Windows Terminal only supports the VT240 interpretation.
The VT382 terminals also have an additional complication. The documentation says that it supports heights of 10, 20, and 30, but it doesn't have different screen heights like later terminals, so that implies it might somehow stretch those smaller heights to fill the cell. And in general, I think all level 3+ terminals are supposed to support VT2x0 fonts, which also implies some kind of stretching would be required. But perhaps they're just centered in the cell (you can at least test how the VT340 handles this).
When it comes to loading existing fonts, though, the biggest complication is that many of them don't actually include the width/height parameters - they just set them to 0, which is supposed to imply the default size for the device. If you're wanting to support multiple devices, that means you have to try and guess the size based on the actual pixel content (amongst other things).
I should also be clear that the above info is to the best of my knowledge, and may not be perfect. The only device I've really tested on is the MAME VT240, but I tested the Windows Terminal implementation with loads of fonts found on the internet, targetting different devices and different screen sizes, and that's assured me that my understanding is likely correct for the most part.
Maybe I could quickly magnify the current character by setting a long aspect ratio and inserting ! to repeat each sixel on the screen.
This is genius btw. I didn't think a sixel-based editor would be practical on the VT340, but this seems like it could work quite efficiently.
In later terminals did DEC ever allow soft fonts to have repeats in them?
Not that I'm aware of, no. And for a typical text font, there's probably not a lot of use cases where there would be any benefit in having a repeat: maybe -
, _
, and =
? So I don't think that would justify the additional complication to the protocol.
I don't know if this is of any use to you, but these are my notes on the DEC Technical character set, and the mappings (or lack of mappings) to Unicode. You're welcome to use any bits that you think might be helpful in your documentation.
Component Characters
The first 23 glyphs of the DEC Technical character set are known as component characters, intended to be used in the construction of larger mathematical symbols, such as integral and summation signs. This is explained in the Digital ANSI-Compliant Printing Protocol level 2 reference manual (appendix A.4), but can also be inferred from the character names referenced in the DEC STD 070 manual (section 7.5.5).
In the image below, you can see how the various glyphs are intended to connect in order to produce symbols of varying sizes.
Over the years 1998 to 2000, proposals was made to add some of these characters to the Unicode standard as part of the Terminal Graphics for Unicode set. However, what we eventually got from that effort was unfortunately not adequate to satisfy the needs of the component symbol structures.
Initially there were code points proposed for the extended square brackets, parentheses, and braces (02/07 to 03/00), but those were ultimately withdrawn in favor of similar characters in the STIX Math set. But the DEC characters were intended to share a single connecting vertical line, whereas the STIX proposal had separate connectors for left and right (and unique for each bracket type), so they don't align correctly in the DEC use case.
Then there are the summation characters, which the Terminal Graphics proposal never covered very well to start with. It appears they weren't aware of all the parts that were intended to combine, so only proposed the top left and bottom left glyphs (03/01 and 03/02), and again these were withdrawn in favor of STIX code points (U+23B2 and U+23B3). Unfortunately those glyphs can't really be used to construct the larger summation symbols that require connectors.
And even if they could, many of the other summation parts weren't defined anyway. The diagonal connectors (03/03 and 03/04) could potentially be mapped to existing diagonal code points in Unicode (U+2572 and U+2571), but there is nothing for the center connector. And the top right and bottom right ends of the symbol were mistakenly thought to be the right half of the ceiling and floor functions, for which there were already code points in Unicode (U+2309 and U+230B). Those glyphs are nothing like what is needed for the the summation symbol, though.
On the positive side, there were already reasonably suitable code points for the integral sign (U+2320 and U+2321), the horizontal and vertical connectors (U+2500 and U+2502), and the top left of the radical symbol (U+250C). And the bottom left of the radical symbol was actually included in the Terminal Graphics proposal, and ultimately accepted as U+23B7 (unfortunately many font faces don't render it correctly, but that's still better than nothing).
To summarize, the table below lists the placeholder code points from the original Terminal Graphics proposal, the existing Unicode code points for those characters that were already deemed to be included in the standard, the code points assigned in the STIX proposal, and finally the one code point from the Terminal Graphics proposal that actually made it into the standard.
† These glyphs could work for a simple two-character summation, but are inadequate for building larger symbols with connector elements. ‡ Although these code points were originally proposed for 03/05 and 03/06, the glyphs are wholly inappropriate for use in a summation symbol.
Greek Characters and Mathematical Operators
The remaining characters in the DEC Technical set are a mix of Greek characters and mathematical operators. All of these have existing code points in the Unicode standard - in some cases there are even multiple code points to choose from.