microsoft / cascadia-code

This is a fun, new monospaced font that includes programming ligatures and is designed to enhance the modern look and feel of the Windows Terminal.
Other
25.84k stars 803 forks source link

Feature Request: Add the Control Pictures Unicode block #219

Closed PhMajerus closed 3 years ago

PhMajerus commented 4 years ago

We talked about control characters before, and how they are interpreted by the console or the terminal instead of printing characters. When working in a terminal, it is sometimes helpful to visualize these in-band control sequences, Visual Studio Code even does it when showing files, by showing small "ESC", "SUB",... when the "Render Control Characters" option is enabled.

Unicode got the same idea, and included a block of Control Pictures (U+2400 to U+2426) in Unicode 12.0. These are designed to be able to represent the control characters on a terminal screen: ␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟␠␡␢␣␤␥␦ (https://en.wikipedia.org/wiki/Control_Pictures) When available, these can be used by CUI apps to provide a visual representation of these special characters, exactly like Visual Studio Code does.

Adding these 39 glyphs would make it possible for utilities such as hexdump to show them in text representations, which is much more helpful than having 34 of the 256 values show up as generic dots. It would even make it possible to a CUI text editor to provide the "Render Control Characters" option.

Windows Terminal currently falls back to another font to render these, but they are tiny and impractical for use in a terminal.

Below is a sample of a hexdump function showing the contents of cmd.exe with high-ascii and control characters (using the font fallback): hexdump with Control Pictures And the Ubuntu hexdump command showing the same file, with dots for high-ascii and control characters (so only 96 values out of 256 provide chars representations). (this one is not using Cascadia, but shows the limitation when these characters are not available). hexdump without Control Pictures

aaronbell commented 3 years ago

Been a while! I thought I would take a crack at these, and, well, they're hard to fit into that little box! I used a similar descending set of letters and thought I'd get your opinion. They're still a bit rough :x. Thanks!

Screen Shot 2020-12-01 at 8 59 32 PM
aaronbell commented 3 years ago

Forgot these!

Screen Shot 2020-12-01 at 9 16 05 PM
PhMajerus commented 3 years ago

@aaronbell Thanks for looking into the control pictures.

I think you'll have an artistic and practical decision to make with those, balancing your font visual identity, readability, and familiarity.

The diagonal sets of letters seems to be the most common representation and the one used by the Unicode consortium: https://unicode.org/charts/PDF/U2400.pdf and Segoe UI Symbol.

Visual Studio Code, on the other hand uses horizontal sets of letters: image

I'm not sure which would be easiest to read in a Terminal app at small sizes.

Alternatively, it seems square graphic symbols were once standardized for all of these, but not commonly used anymore (probably because the C0 control characters were originally designed with punched cards and serial terminals in mind). See the rightmost column in the C0 table at https://www.aivosto.com/articles/control-characters.html#list_C0 Notice how SUB and DEL graphic symbols seem related to the modern ␦ and ␥. While not immediately familiar for today's users, they are all pretty representative of their intents and would be recognizable at small sizes. Maybe a case of "old enough to be new and fancy again". This would be similar to the "Show/Hide ¶" symbols in Word, they might not make sense at first, but users working with them would get used to them and having simple and easily recognizable pictures at small sizes might be a benefit in the long run over the more explicit but hard to read sets of letters.

I think if Windows Terminal allowed me to set different fonts for specific Unicode ranges I would try to get these legacy graphic symbols in a font by themselves and use those, they seem more readable at my typical Terminal font size and I even though I never encountered these graphic representations before, I could get used to them easily enough.

aaronbell commented 3 years ago

Thanks for the info! I had actually originally designed the control characters to be full height 1/3 width but found that they were really difficult to read, especially when you have a bunch in a row (like your original sample image). As such, the diagonal form actually makes a lot of sense. I think, though, that I’ll continue to experiment with making them a bit more bigger (wider maybe?) to see what kind of results I can get.

The graphical form versions is a good idea too! Interestingly, many of those forms are already present in the geometric shapes codepage: http://www.unicode.org/charts/PDF/U25A0.pdf so I think it would be relatively straightforward to bring them in. I think it would make the most sense to add them as a stylistic set of the base versions so folks like you who want to give them have that option available :)

aaronbell commented 3 years ago
Screen Shot 2020-12-02 at 12 59 20 PM

Did a fun little mockup of the square graphic symbols. While I suspect maybe you and one other person would even use these intentionally, they're certainly more recognizable than the text versions. The only one that didn't have an alternate form is the NL (which appears to be NEL in the standard form / NL in the Unicode chart, and doesn't appear to be used as frequently as LF).

As for the textual variant, I found that increasing the width of the forms allowed for a more open, recognizable shape. Of course, some extra tweaking is necessary, but overall I think it is rendering well at smaller sizes (here at 14pt), even at the heavier end of the range:

Screen Shot 2020-12-02 at 1 21 45 PM

(do note this is unhinted, so it'll be a bit blurrier than an actually hinted version)

PhMajerus commented 3 years ago

Here's the background story on LF, NL and NEL.

U+2400 to U+241F are pictures of all the C0 control characters from ASCII, mapped directly from their corresponding U+0000...U+001F control characters. Pure ASCII contains only those and the extra SP (space) and DEL (delete). These are the ones that also have legacy standardized square graphical representations, and could be considered a group if you take artistic liberties with them.

C1 control characters, which NEL is from, are a whole other set of codes only available in ANSI (extended ASCII / high-ASCII) and Unicode, and as far as I know those have no representation in the control pictures range.

My understanding is that all the remaining pictures ␢ ␣ ␤ ␥ ␦ in U+2422 to U+2426 are general-purpose representation of hidden characters for CUI apps that wish to show simpler symbols, for example for a text editor aimed at less technical people. The Unicode chart at https://unicode.org/charts/PDF/U2400.pdf seems to confirm that, as these extras are not labelled as "control codes". They probably should therefore be kept similar to their original design as it means an app using them made the explicit choice of using these alternate versions for style and might use them for app-specific things unrelated to C0 control characters.

"NL" is documented by Unicode as being a symbol for New Line, which isn't the same as the NEL (Next Line) from C1 control characters. The origin of the term "New Line" is that operating systems didn't agree on what constitutes a proper sequence of control characters to start a new line. CR is originally a carriage return, as in returning the typewriter to the left position, but not changing the line, while LF is a line feed, shifting the page to the next line, but not moving the horizontal position. CP/M, MS-DOS and Windows got it right and use the combination CR+LF to start a new line. Unix figured it could save a byte by using LF alone, disregarding the standard, while Apple decided to Think Different and therefore settled on using CR alone (in MacOS classic, they now moved to a Unix LF in OSX). So New Line isn't a control character, but a generic term for different combinations of control character(s) that produce a newline on different platforms. This means an app such as a text editor might choose to show the "NL" picture for whichever sequence is appropriate for the underlying platform to provide consistency to the user across different operating systems instead of revealing exact C0 control sequences. If you want to change it into a symbol, something generic like Word's new line symbol ↵ is probably fine.

NEL (Next Line) on the other hand is a C1 control character for compatibility with IBM mainframe's EBCDIC encoding, which had something similar to LF but not exactly, and therefore needed a separate control character for text exchange. None of the C1 control characters have graphical representation in the U+24xx range, so I'm pretty confident NL stands for New Line and is unrelated to Next Line.

DHowett commented 3 years ago

Wow, I'm really liking how these are shaping up here in https://github.com/microsoft/cascadia-code/issues/219#issuecomment-737504103. You're right, they felt a little cramped in the first pass you made :smile:

aaronbell commented 3 years ago

Thanks @DHowett! Do we dare put the graphical versions as default? :D

DHowett commented 3 years ago

Aw, alas. I use them for debugging Terminal, so I would need to learn a whole new language if we do that! 😁

I'm not against it, for sure. haha

aaronbell commented 3 years ago

I’ll give you a sample version to mull over ;)

PhMajerus commented 3 years ago

@aaronbell @DHowett Hey! no fair! why the microsofties-only version?! 😭

Looking at https://github.com/microsoft/cascadia-code/issues/219#issuecomment-737504103, it seems 2-letters symbols are going to be more readable than 3-letter ones. (I still like the graphics symbols, but understand many users might not want to have to learn another set of symbols.)

Since the 2-letter abbreviations for C0 codes are less common but nonetheless standardized (https://en.wikipedia.org/wiki/ISO_2047), what about a variant that uses the 2-letter versions for the whole set? This would make their size more consistent than a mix of 3-letter and 2-letter, and probably would help readability at small sizes. Users used to the more common 3-letter versions can probably figure out the shortened ones without too much trouble.

aaronbell commented 3 years ago

@PhMajerus Don't worry, I won't exclude you :).

aaronbell commented 3 years ago

@PhMajerus @DHowett Alright, sorry for the delay :)

Here's a demo version of the font, named Cascadia CTRL: CascadiaCNTRL.zip

A couple of notes:

IIRC, Windows Terminal (and I think VSCode) let you set stylistic sets. Give it a try and see what you think!

PhMajerus commented 3 years ago

@aaronbell Thanks for doing all 3 variants, they all look great and I feel each have their benefits. I cannot find how to set the variants in either Code or Terminal in their json settings documentation, but judging from the default 2-3 letters variant I could try and the pictures you posted, I really like all 3 designs. The more vertically-stacked letters giving them more overlap than the common 45° diagonal representations makes them both very readable and more distinct when several control characters are following each other, this really works well.

I'm really curious to try the all 2 letters variant in Terminal as the 3 letter ones are a bit too small for me at my usual font size. @DHowett did I miss something in the Terminal documentation for font variants or is that something that's not yet implemented?

mdtauk commented 3 years ago

Implementing them would be a tricky prospect, at least in an easily discoverable way.

Not every font will offer alternate Stylistic Sets - so unless Cascadia Code is treated as a special case, and extra settings show up when its the chosen font - the only way to implement it would be to allow settings a stylistic set for all fonts that include them.

And then, these sets don't include names, and showing the user what these stylistic sets are used for, would be impossible.

aaronbell commented 3 years ago

@mdtauk My putting them in stylistic sets is purely for testing purposes—so that they are somewhat accessible for y'all to take a look at. I expect that for the final version, we'll lock to one of the three approaches.

aaronbell commented 3 years ago

@PhMajerus - Here's the setup for VSCode at least: https://github.com/microsoft/vscode/issues/80577

mdtauk commented 3 years ago

@mdtauk My putting them in stylistic sets is purely for testing purposes—so that they are somewhat accessible for y'all to take a look at. I expect that for the final version, we'll lock to one of the three approaches.

That is fair enough, but there is little reason not to include stylistic sets for the sake of a more complete typeface - even if Terminal doesn't provide a user facing way to change it

PhMajerus commented 3 years ago

@aaronbell Ah, sorry, I didn't realize stylistic sets were selected by ligatures options. Thanks for the info. So VS Code supports them in the editor but not in the built-in terminal for performances reasons, but that was enough to see how each looks like by copy/pasting from terminal to editor.

After trying all 3, first they all look great!

I really wanted to try to get used to the graphical symbols (ss20), hoping they would be the most readable and faster to scan through once used to them, but testing them mixed with other characters it quickly becomes apparent that they probably only work well when shown in ASCII-only strings, as then the set of other characters present is very limited and does not include any graphical character that could be confused with them. Once a string contains a larger set of graphical characters from Unicode, it becomes more difficult to differentiate them from other graphics, and they lose their benefits. I think they look great and, if it doesn't have any negative impact like file size or performance when not used, they should be kept as an option, but they probably will only be practical in specific scenarios like debugging ASCII-based communications. I could see them used in a hex+ascii file editor or a serial monitor for example. image

I find the 2 letters (ss19) variant really good for readability, and while it will require some time to feel natural as we're more used the the 2-3 letters, I would probably use the 2 letters one for Terminal. This is probably very dependent on font size and DPI, but testing on both a 1920x1080 monitor at 100% and a Surface Book 2 at 200%, the 3 letters ones are slower to read because they end up less well-defined. This could still change with hinting though. image image

image image

I think using the 2 letters variant as the default could provide better readability and discoverability. By this I mean someone used to the 2-3 letters and confused by the 2 letters-only is more likely to investigate and find information online about the variants and how to select the other one. On the other hand, someone finding the 3 letters hard to read is more likely to just increase their font size or change to another font than to ever learn about the variants that could have improved their use of Cascadia.

aaronbell commented 3 years ago

Thanks for the review @PhMajerus!

Your experience aligns pretty well with what I suspected might be the case. The graphical variants are fun, but are difficult to parse in real life scenarios. I would be tempted to leave them there, but I think I have to be honest with myself that the likelihood of anyone using them when they're hidden behind OpenType is quite low—even modern coding / terminal environments don't necessarily support stylistic sets, let alone anything older.

Between the 2-letter and mixed settings, it makes sense that the 2 letter variant would render more clearly. With proper hinting they'll perform markedly better with clearer differentiation between the letters, whereas the mixed setting will likely only perform similarly, or slightly better. The problem is that there just aren't sufficient pixels to create definition in the three-stacked form—as you said, folks are likely to switch fonts or increase point size to make them out. For similar reasons as the graphical variants, I think I'd skip providing the mixed setting (or wasting a stylistic set slot on them), and just provide the 2 letter abbreviations. I think folks will be able to get used to it pretty quick.

@DHowett What do you think? Would you be open to using the 2 letter variants as default?

DHowett commented 3 years ago

Based on @PhMajerus' screenshots above, I would absolutely be open to using the 2-letter variants as a default.

I wish I'd given terminal the ability to choose stylistic sets. I like them all. :smile:

I'll kick the tires myself, as well. Thanks for putting this together.

schuelermine commented 3 years ago

Thanks for the review @PhMajerus!

Your experience aligns pretty well with what I suspected might be the case. The graphical variants are fun, but are difficult to parse in real life scenarios. I would be tempted to leave them there, but I think I have to be honest with myself that the likelihood of anyone using them when they're hidden behind OpenType is quite low—even modern coding / terminal environments don't necessarily support stylistic sets, let alone anything older.

Between the 2-letter and mixed settings, it makes sense that the 2 letter variant would render more clearly. With proper hinting they'll perform markedly better with clearer differentiation between the letters, whereas the mixed setting will likely only perform similarly, or slightly better. The problem is that there just aren't sufficient pixels to create definition in the three-stacked form—as you said, folks are likely to switch fonts or increase point size to make them out. For similar reasons as the graphical variants, I think I'd skip providing the mixed setting (or wasting a stylistic set slot on them), and just provide the 2 letter abbreviations. I think folks will be able to get used to it pretty quick.

@DHowett What do you think? Would you be open to using the 2 letter variants as default?

I love the graphical variants

PhMajerus commented 9 months ago

Hey @aaronbell and @DHowett, sorry for posting in a closed issue, but I thought you might enjoy this, and it provides even more validation for the choice made if anyone happens to read through this thread.

Looking at some documentation on the HP 264x terminals series (from the 1970s), I found out they also used the 2-characters representation for control pictures: Roman Uppercase Roman Lowercase (more details at https://www.curiousmarc.com/computing/hp-264x-terminals)

BTW, after two years with the 2-characters variant, I'm really happy with the readability, and I keep seeing legacy systems where they made the same choice back in the 1970s and 1980s. Thanks again for making this happen!