kent-karlsson / control

ECMA-48 update proposals + math expression representation proposals
1 stars 0 forks source link

Earlier conversation archived here #1

Closed egmontkob closed 1 year ago

egmontkob commented 1 year ago

A few years ago, Kent Karlsson reached out to a few of us in private mail.

This was apparently a follow-up of some earlier discussions on the Unicode mailing list, although I can't recall for sure, and I'm not looking it up now.

With permission from the active participants, hereby I'm publishing that conversation.

I have no particular goal with this, I just believe that we've put quite some effort into evaluating the proposal (as it looked back then) and sending precious feedback that it shouldn't be buried in our mailboxes forever. Others who evaluate the proposal or whether they wish to adopt it might find it useful.

Each email will take one comment with some trivial formatting changes (removal of quotes that weren't responded to, removal of email addresses, removal of company or URL from signatures, removal of colors due to lack of support in Markdown). I'm trying my best to keep bold/italic emphasis, but I'm doing this manually so tiny details like these might be lost.

The conversation took place between May 22 and Jul 18, 2019. I'm omitting the exact dates as I don't think they're useful.

egmontkob commented 1 year ago

Kent Karlsson wrote:

Subject: Formatting (text styling) command sequences in ECMA-48, proposed update

HI!

I don't know if you care, or even want to care, nor do I know if anyone else might care, but...

This goes back to the discussions on the Unicode mailing list, both about terminal emulators and "primitive" text styling. I do think there are some contexts where text styling à la ECMA-48 still does have a place. Not only terminal emulators, though that is of course an important target for ECMA-48 in general (and then not just its text styling).But I think the text styling part of ECMA-48 can be used also in text editors, as a simpler alternative than HTML/CSS or RTF.

I have tried to write down some proposals for updating the ECMA-48 text styling commands. It's not a (minimal or otherwise) profile of ECMA-48 text styling, but instead making more precise existing (standard or, IIUC, common extension) formatting codes plus an extension adding more formatting codes. Profiles (implementation profiles) can be defined on top of that, if desired.

Due to the age of ECMA-48 and "derivatives", I imagine that none of the involved standards organisations are too keen on assigning a working group, and actually go though the work of doing an update. (Besides, I don't want to open a can of worms trying to update all of ECMA-48.) And it does not fit well as a Unicode technical note either, I think. (In case you think otherwise, great!)

If you think a suggestion along the lines of the content of the attached document, in some venue, has a chance, you are welcome to comment on the content (and suggest a venue).

Kind regards /Kent Karlsson

SGR-update-A.pdf

egmontkob commented 1 year ago

Asmus Freytag wrote:

On 5/21/2019 4:06 PM, Kent Karlsson wrote:

HI!

I don't know if you care, or even want to care, nor do I know if anyone else might care, but...

This goes back to the discussions on the Unicode mailing list, both about terminal emulators and "primitive" text styling. I do think there are some contexts where text styling à la ECMA-48 still does have a place. Not only terminal emulators, though that is of course an important target for ECMA-48 in general (and then not just its text styling).But I think the text styling part of ECMA-48 can be used also in text editors, as a simpler alternative than HTML/CSS or RTF.

The top-level bit for any of this is whether there are actual implementations/implementers for that functionality. The "here is an interesting writeup/standard" approach will lead to eventual implementers simply re-inventing the wheel out of ignorance. The reason the ECMA standards were successful in their time was that ECMA represented what Unicode is today: a collaboration of implementers. Even HTML has been taken over by the WhatWG, because of that group's focus on not getting ahead of actual implementations.

In you scenarios, who would issue the command sequences? If it is the text editor, then that assumes it runs on some sort of "platform" that can interpret these codes. Who would create such a platform? If it is users, putting fancy annotations in files so they look good on text editors, then I'd like to understand more about the use case. But in that case the "platform" would be the editor. In either case, do you have interest from platform vendors (or perhaps non-commercial suppliers of such platforms)?

The next level bit is to explain what ECMA 48 delivers out of the box, and why their approach has shortcomings (or shortcomings in a world where text is Unicode); it looked like you leap directly into a full-blown scheme. Assume most readers aren't familiar with the history.

I looked at your draft very quickly (as I'm headed out shortly), so apologies if I skipped over something. In that case, perhaps figure out how to help impatient readers to locate that.

I have tried to write down some proposals for updating the ECMA-48 text styling commands. It's not a (minimal or otherwise) profile of ECMA-48 text styling, but instead making more precise existing (standard or, IIUC, common extension) formatting codes plus an extension adding more formatting codes. Profiles (implementation profiles) can be defined on top of that, if desired.

Due to the age of ECMA-48 and "derivatives", I imagine that none of the involved standards organisations are too keen on assigning a working group, and actually go though the work of doing an update. (Besides, I don't want to open a can of worms trying to update all of ECMA-48.) And it does not fit well as a Unicode technical note either, I think. (In case you think otherwise, great!)

I am not sure that ECMA is an ongoing concern, or if it is, that they are working on that technology - you may have to assemble a new community of active users/implementers to make any progress here.

If you think a suggestion along the lines of the content of the attached document, in some venue, has a chance, you are welcome to comment on the content (and suggest a venue).

Also, I think any document / proposal would have to start with clear suggestions of specific use cases and demonstrate the drawbacks of existing solutions (which might include an HTML/CSS light - i.e. one that doesn't assume a full document but works well with fragments and uses "implied" style sheets).

That analysis seems more important than the details.

A./

PS: an analysis of the problem of styling short runs of texts, and exploring different possible solutions given existing technology, might be a suitable topic for a UTN. Especially, if it stops at the requirement / problem / analysis level.

egmontkob commented 1 year ago

Doug Ewell wrote:

Kent,

Thank you for preparing and sending this. I need to set aside ample time to go through it in depth, but in general I like the idea.

My only comment at this stage is that any suggestions for changing or augmenting the ECMA-48 standard need to be indicated very clearly, and made separate from suggestions on how to implement the existing standard. You do not (I trust) want this to look like you are reinterpreting ECMA-48 on your own, making up extensions as you go. It should be clear how a sender or receiver that follows the existing standard should behave.

-- Doug Ewell

egmontkob commented 1 year ago

Kent Karlsson wrote:

Den 2019-05-22 01:47, skrev "Asmus Freytag":

On 5/21/2019 4:06 PM, Kent Karlsson wrote:

Formatting (text styling) command sequences in ECMA-48, proposed update HI!

I don't know if you care, or even want to care, nor do I know if anyone else might care, but...

This goes back to the discussions on the Unicode mailing list, both about terminal emulators and "primitive" text styling. I do think there are some contexts where text styling à la ECMA-48 still does have a place. Not only terminal emulators, though that is of course an important target for ECMA-48 in general (and then not just its text styling).But I think the text styling part of ECMA-48 can be used also in text editors, as a simpler alternative than HTML/CSS or RTF.

The top-level bit for any of this is whether there are actual implementations/implementers for that functionality.

For the colours, yes there are multiple implementations (all in terminal emulators or similar systems). See https://en.wikipedia.org/wiki/ANSI_escape_code#Colors for a list I am sure is partial. But... all of them interpret the colours a bit differently. Which, I would say, is a sign of insufficient standardisation. In addition some implementation use the wrong syntax for full colour command sequences; some use the correct (standard one), but specified not in ECMA-48, only in ITU-REC-T.416. So one of the things I'd like to "fix" (to the extent possible) is the syntax and exact colours for the colour specs.

Some other styling commands were (due to technical limitations of the time) ambiguous, e.g. "bold or increased intensity". In a modern setting, that ambiguity should be removed, and it should be just "bold", no change in colour. Change of colour is by other command sequences.

For some other things:

a) I only have a hint that there are implementations (I haven't done deep diving in terminal emulator implementation; terminal emulators are de-facto the prime implementations of such command sequences), namely:

Den 2019-02-08 22:29, skrev "Egmont Koblinger via Unicode" unicode@unicode.org:

Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m for curly underline.

Scant information (Egmont, do you have a reference?), but CSS does provide a number of variants for such lines.

b) No direct hint, but there are conventional styles that are covered by (e.g.) CSS, but not by ECMA-48, like small-caps, and (first level) superscript/subscript (MS Word has that as a style, and even a "low grade styling" editor like that in Jira (editing via a web page) has this as a style).

The "here is an interesting writeup/standard" approach will lead to eventual implementers simply re-inventing the wheel out of ignorance. The reason the ECMA standards were successful in their time was that ECMA represented what Unicode is today: a collaboration of implementers.

And it seems to be still going on, among terminal emulator implementers (IIUC), but not common writeup (IIUC).

Even HTML has been taken over by the WhatWG, because of that group's focus on not getting ahead of actual implementations.

In you scenarios, who would issue the command sequences?

For terminal emulators, as now: other programs that does the styling. Run the "man" command (program), and see the (minimally) styled output. Several other commands, like "grep" (on Linux) does colouring. Many other "special" programs (for commercial or internal systems) also do styling (for output directed to a terminal emulator). Yes, in some cases HTML would be/is useful, esp. for grahics (SVG, PNG), but a terminal emulator is not (and will not become, IMHO) a browser. (To have a "browser overlay" on a terminal emulator would be an idea, but not in scope for an ECMA-48 SGR fixup..., and not entirely sure it would be a good idea.)

Terminal emulators is the de-facto primary implementation platforms for ECMA-48+extentions. The styling part /can/ be a "save format", among "txt", "html", "rtf" , "pdf" and others; though I don't know of any such implementations.

If it is the text editor, then that assumes it runs on some sort of "platform" that can interpret these codes. Who would create such a platform? If it is users, putting fancy annotations in files so they look good on text editors, then I'd like to understand more about the use case. But in that case the "platform" would be the editor. In either case, do you have interest from platform vendors (or perhaps non-commercial suppliers of such platforms)?

The next level bit is to explain what ECMA 48 delivers out of the box, and why their approach has shortcomings (or shortcomings in a world where text is Unicode); it looked like you leap directly into a full-blown scheme. Assume most readers aren't familiar with the history.

True, for now it is kind of formulated like an amendment. It may need an auxiliary separate document with motivations.

I looked at your draft very quickly (as I'm headed out shortly), so apologies if I skipped over something. In that case, perhaps figure out how to help impatient readers to locate that.

I have tried to write down some proposals for updating the ECMA-48 text styling commands. It's not a (minimal or otherwise) profile of ECMA-48 text styling, but instead making more precise existing (standard or, IIUC, common extension) formatting codes plus an extension adding more formatting codes. Profiles (implementation profiles) can be defined on top of that, if desired.

Due to the age of ECMA-48 and "derivatives", I imagine that none of the involved standards organisations are too keen on assigning a working group, and actually go though the work of doing an update. (Besides, I don't want to open a can of worms trying to update all of ECMA-48.) And it does not fit well as a Unicode technical note either, I think. (In case you think otherwise, great!)

I am not sure that ECMA is an ongoing concern, or if it is, that they are working on that technology - you may have to assemble a new community of active users/implementers to make any progress here.

I think you are right about ECMA. Egmont, is there a terminal emulator developers's cooperation of some kind, where this might be of interest?

If you think a suggestion along the lines of the content of the attached document, in some venue, has a chance, you are welcome to comment on the content (and suggest a venue).

Also, I think any document / proposal would have to start with clear suggestions of specific use cases and demonstrate the drawbacks of existing solutions (which might include an HTML/CSS light - i.e. one that doesn't assume a full document but works well with fragments and uses "implied" style sheets).

For terminal emulators (the primary target for this proposal), HTML/CSS is out of the question. For text editors, there are already several "save formats", using ECMA-48 style SGR command sequences would be just one more. And remember that there was quita a call for "plain text styling" on the Unicode e-mail list a few months ago. Not that that reached a conclusion, but my conclusion was that ECMA-48 style styling (only SGR commands, nothing else from ECMA-48) would be the only viable option for that... Other options presented then were dead-beat for one reason or another, especially in the face of an already existing (though old and needing some brush-up) still sucessful, and in some parts surprisingly powerful, standard for that.

Kind regards /Kent K

egmontkob commented 1 year ago

Asmus Freytag wrote:

On 5/22/2019 4:25 PM, Kent Karlsson wrote:

For terminal emulators (the primary target for this proposal), HTML/CSS is out of the question.

But it may be a natural for the likes of Twitter, Tumblr, FB and other social media that are composed of short posts.

Forum software, in its time, invented BBC code for the same purpose.

For text editors, there are already several "save formats", using ECMA-48 style SGR command sequences would be just one more. And remember that there was quita a call for "plain text styling" on the Unicode e-mail list a few months ago.

A lot of people who are quite vocal on that list have no connections to delivered product.

To get traction, you cannot do without commitment by those that "ship" solutions; by all means that category includes various forms of free-ware, but not vaporware or "it would be nice to have"-ware.

Not that that reached a conclusion, but my conclusion was that ECMA-48 style styling (only SGR commands, nothing else from ECMA-48) would be the only viable option for that...

I am not familiar with text editors other than graphical ones. The one that does the most Styling is Source Insight, but their styling is syntax driven so that there is no markup (except some special commenting for different visual style comments). All the other ones seem to use colored text, only, but they also do not use "teminal" platforms (or save the color information).

If a new "document fragment" form of coloring/font styling were adopted, I would most likely encounter in social media, not pseudo-terminals. And in my view, the ECMA format is not a natural for fragments that are ultimately viewed in a browser. So there's a large market that could use limited styling that won't be addressed by your proposal. (However, you might think of a translation table to a CSS style, if only to help browsers convert any saved text files).

Other options presented then were dead-beat for one reason or another, especially in the face of an already existing (though old and needing some brush-up) still sucessful, and in some parts surprisingly powerful, standard for that.

Windows Command console does colors, but I've only found an API for that, not any in-line codes. I know I used to be able to insert codes in the text in the past, but I don't know when that went away.

Again, hope all of this helps you in refining your efforts,

A./

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi,

On Thu, May 23, 2019 at 1:25 AM Kent Karlsson wrote:

For the colours, yes there are multiple implementations (all in terminal emulators or similar systems). See https://en.wikipedia.org/wiki/ANSI_escape_code#Colors for a list I am sure is partial. But... all of them interpret the colours a bit differently. Which, I would say, is a sign of insufficient standardisation. In addition some implementation use the wrong syntax for full colour command sequences; some use the correct (standard one), but specified not in ECMA-48, only in ITU-REC-T.416.

As per https://gist.github.com/XVilka/8346728 user comments dated Dec 2018, it's not clear to us how to interpret the standard ("38;2:..." or "38:2:..."). Some implementations went for the full colon approach as that's the one that we believe makes more sense. A "new standard" could clarify it.

So one of the things I'd like to "fix" (to the extent possible) is the syntax and exact colours for the colour specs.

Sounds great.

Some other styling commands were (due to technical limitations of the time) ambiguous, e.g. "bold or increased intensity". In a modern setting, that ambiguity should be removed, and it should be just "bold", no change in colour. Change of colour is by other command sequences.

This change is again highly welcome by some of us, e.g. this is the only behavior implemented in the Kitty terminal, and recently the new default of GNOME Terminal.

Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m for curly underline.

Scant information (Egmont, do you have a reference?), but CSS does provide a number of variants for such lines.

There's no "official" documentation, also some of the design happened in private e-mail conversations. You can find details in kitty's bug https://github.com/kovidgoyal/kitty/issues/226 and gnome-terminal's bug https://bugzilla.gnome.org/show_bug.cgi?id=721761, and in the feature request in quite a few other terminal's bugtrackers.

Note that we picked 4:0 for off (alias to 24), 4:1 for single (alias for 4), 4:2 for double (alias for 21), 4:3 for curly, 4:4 for dotted and 4:5 for dashed.

We'd like to continue the pattern that ":0" turns a feature off, and ":1" is equivalent to having no colon-modifier, and thus newly styles are added to ":2" onwards. See also the italic vs. oblique idea at https://bugzilla.gnome.org/show_bug.cgi?id=789597.

I think you are right about ECMA. Egmont, is there a terminal emulator developers's cooperation of some kind, where this might be of interest?

Just recently we created a "Terminal WG" hosted on freedesktop, with the intent of having such a collaboration forum.

Note, however, that by far not all terminal developers are here. Nor is this a forum where authoritative decisions could be made. Terminal emulator developers more or less all have their individual ideas what extensions to add, and mostly just more or less hardheadedly go their own ways. The said forum is for those who feel like being more cooperative.

I'm not sure how much the discussion of this proposal is a good fit for that platform.

To be honest, I'm really not sure where your proposal is heading, nor that it would be well perceived and accepted by terminal emulator developers. You have a use case in mind, you'd like to see some new "rich text" format emerge, but I don't think it's going to happen, nor do I find it a good idea to build on top of ECMA-48, let alone brand new extensions. But you know, time (and work you put it in) will tell.

I don't think coming up with extensions to ECMA-48 as the first step is a good approach to start it; especially not if its author doesn't study what existing terminals do and comes up with conflicting number choices. It's not going to take us anywhere. I don't see why it would be accepted as standard by terminals, nor how it would attract other use cases. On the other hand, it's a great source of neverending unfixable confusion if some new document format starts using sequences that conflict with the terminal's. I also can't see why terminals would respect the numbers you allocate (plenty of them for really questionable features) and not introduce conflicts in the future.

Terminals evolve based on real life demand, not based on arbitrarily made up new specs. E.g. curly and colored underline (the latter one using SGR 58 and 59, conflicting with your proposal) were introduced to allow nice spell checking experience in terminals, something that users were looking for. On the other hand, I don't see the real life need for small caps (shouldn't it be address by a generic font selector, though?), superscripts/subscripts, shadows, whatnot.

My suggestion is:

Instead of a new proposal for extending ECMA, study and document the currernt practices of terminal emulators. Come up with clarifications or recommendations wherever you feel it's desireable. E.g. your clarification of the "bold or increased intensity" attribute is a nice one, and so is the one with the use of ":" vs ";" in color codes (and note that I say they are nice because they happen to match my opinion; presumably if another terminal developer has a different opinion, they wouldn't like your work here; if you were to recommend something that doesn't match my opinion then I would also start arguing, and if we can't convince each other then I'd just ignore your recommendation). Nice to spot out that "negative image" is implemented as "reverse" by all terminals; I talked to gnome-terminal's main developer about it in private mail last year, but we decided to stick with whatever we already have. Maybe the specification should be altered here to match all the existing implementations? Maybe it's also time to drop "faint" (https://bugzilla.gnome.org/show_bug.cgi?id=791596) and "blink" from the list of supported attributes for your new rich text format?

Come up with a document on how existing ECMA-48 would be used in a new document format; and come up with a study how and why people would begin using it. Come up with various pieces of implementation; convince developers of various software to add this as a new format.

If this becomes popular, and if there's a real life demand to address small caps or whatever, let's get back to that and address them one by one.

In its current format, apologies if it doesn't sound great, but it's just one random guy with some knowledge about terminals who has an idea and wishes to extend and standardize it, makes up a few new styles that are not backed by real life demand, clarifies a few things that indeed desire to be clarified, creates a nicely formatted document and... sits back and wait for others to accept it? It doesn't work this way. There isn't a single authority that's accepted by terminal developers other than the ancient ECMA/ITU standards to some extent, and you can't become one by just writing a proposal. At least, this is my personal opinion.

cheers, egmont

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi Kent,

A quick, not exhaustive review (without philosophical bits) of the document:

It is reasonable to limit the length of these escape/control sequences to 35 characters.

I don't think it is. And why 35, in particular?

An simple RGB sequence for modifying the foreground, plus its delimiter, e.g. "38:5::255:255:255;" is already 18 characters. This doesn't even include color-space and other parameters. An escape sequence can change the foreground, background, underline and whatever future colors plus all kinds of other attributes at once. Just setting all the properties listed in your document might easily become 100+ characters.

(\u001B\u005B|\u009B)

xterm intentionally doesn't accept the C1 variant (U+009B) in UTF-8, see rationale at the very beginning of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html. Many other terminals also don't support it.

With your new document format, I think it's better to support C0 (e.g. U+001B U+005B) only, that way the document is more likely to work when cat'ed to a terminal.

In either case, you should probably state the rationale why you decided to allow or disallow C1.

[the rest of the regex]

Is the regex really this simple for ECMA-48? Or is this your simplified version? (I don't know.)

(U+009B, preferred)

Oh, no no no no no, C1 asks for all kinds of troubles e.g. when the encoding of the text is unknown at one point. C0 is the one used in practice anywhere.

CSI 2m Lean font variant.

You change "faint" to "lean", which is IMO a nice thing to do (see link in my previous mail). But IMHO you should mention this fact and provide rationale as well. The concrete choice of sub-numbers should be shifted, according to my previous mail.

CSI 56m Raised to first superscript level

Would be nice if the document stated where each such item is taken from (potentially altered or clarified), or if they are newly added.

Cancelled also by LF, CR, VT, NEL, LS, PS.

Why? Having to have different logic for tracking various attributes would be a PITA for implementations.

CSI 59m Not raised/lowered and back to the set size.

Conflicts with underline color in some terminals.

CSI 67m Display uppercase letters as normal.

In order to save on the namespace, I would consider only allowing "66" or "66:1" to enable and "66:0" to disable; similarly for other newly introduced numbers. And in case of superscript/subscript, since they're mutually exclusive, it'll go like "56:1" for superscript, "56:2" for subscript and "56:0" to restore.

SMALL CAPS PROPORTIONAL WIDTH/LOWERCASE DIGITS FONT SIZE CHANGE ADVANCE MODIFICATION LINE SPACING

Are these really needed?

For font size and advanced: How are those units (millimeter etc.) to be interpreted in various contexts, e.g. monitor? What does "pixel" mean in case of HiDPI scaling?

For line spacing. Typically a paragraph is wrapped to lines by replacing a continouos sequence of whitespaces (perhaps more than 1 of them) with a linebreak. Sometimes a linebreak occurs where there's no space character, just the corresponding Unicode algorithms allow a break. The amount of vertical spacing is taken from the corresponding parameter of which character exactly?

FOR CJK

I never knew what 60-65 were for, thanks for the clarification.

CSI 51m [...] but for a string rather than a single character

Is this taken from somewhere, or your addition?

6) Text “decoration”

With the introduction of color underlines, kitty and gnome-terminal place the underline behind the letter, so that if they intersect it's the letter that is clearly visible. (Even further improved implementations might stop the underline a bit sooner so that it doesn't even touch the letter.) This behavior doesn't match your bottom-to-top order.

t in 0--256 (0 means “inherit transparency of default background colour”, 1 is fully opaque, 256 is fully transparent).

This off-by-one sounds pretty dangerous, implementations will get it wrong; especially that it's for the background only and not for the foreground. Where does it come from?

The colours here are a compromise and should be used for new or updated implementations.

Not sure if your proposal is for the new document format only, or for terminal emulators as well. Many terminals offer various color palettes for the user to choose from, including traditional ones and some "special" ones (e.g. solarized) as well. You shouldn't try to recommend to stop this practice and offer a single fixed palette only.

You might want to firmly disallow the palette colors from the document format.

Overstrike (strike-through) is used to mark for deletion

The feature and escape sequence originates from Kitty, which uses it for underline only (not strikethrough). gnome-terminal follows this behavior.

CSI 58:0m (the 0 can be omitted, CSI 58m) Reset text decoration colour to follow foreground colour.

Why did we pick 59 for resetting the underline color, rather than 58:0? I honestly can't recall. We should read back the threads. Probably for symmetry with the 38-39 and 48-49 ones. Anyway, it's probably too late to change.

Text emphasis mark colour

Do we need this? Does it need to be distinct from the underline color? Is it okay to share the same color for CJK emphasis and framing lines, why don't these two have separate codes?

Have you considered runtime cost in terminal emulators implementing this feature? E.g. gnome-terminal currently has 64 bits for the color: 25 for foreground (true color + other special values), 25 for background and the remaining 14 for the underline (that is, the RGB value is approximated using 4+5+4 bits). Further increasing it could have a noticeable runtime penalty, and as such, a dealbreaker.

CSI 89m Cancel all shadows

Or 80:0m?

CSI 7m Negative image.

Maybe you can remove it from the list of supported features for the new document format.

Variants: CSI 5:0m sinusodial, CSI 5:1m VAVA-shaped, CSI 5:2m trapezoidal, CSI 5:3m rectangular.

I bet no one ever wants to bother implementing such modes. (I just recently implemented blinking in gnome-terminal for feature completeness among the "basic" ANSI/ECMA attributes and for fun, but I have to tell you, I came across more webpages stating that luckily gnome-terminal didn't support it than ones complaining about the lack of it.)

Also I'm wondering what's VAVA-shaped, e.g. when it reaches the top (the upper right of a letter V) does it suddenly jump to the bottom? Sorry, this textual description is not clear to me.

Hope these help,

cheers, egmont

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi,

Non-goals: [...] hyperlinks

Are you aware that some terminals support hyperlinks? See https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda . You could easily adopt a simpler version (with the omission of the "id" parameter).

cheers, egmont

egmontkob commented 1 year ago

Asmus Freytag wrote:

On 5/23/2019 2:54 AM, Egmont Koblinger wrote:

Instead of a new proposal for extending ECMA, study and document the currernt practices of terminal emulators. Come up with clarifications or recommendations wherever you feel it's desireable. E.g. your clarification of the "bold or increased intensity" attribute is a nice one, and so is the one with the use of ":" vs ";" in color codes ...

... and continues with a warning that the latter may not be universally agreed upon.

I'm excerpting this passage because it seems to me that it echoes the work WhatWG did for HTML - they very carefully considered implemented practice.

Another way of putting this is that many successful standards are either "trailing standards" (codify existing practice) or "implementer driven", that is, there's a prior commitment by leading implementers to support the outcome.

Both approaches ensure that something doesn't become "yet another standard".

A./

egmontkob commented 1 year ago

Doug Ewell wrote:

Kent wrote:

For the colours, yes there are multiple implementations (all in terminal emulators or similar systems). See https://en.wikipedia.org/wiki/ANSI_escape_code#Colors for a list I am sure is partial. But... all of them interpret the colours a bit differently. Which, I would say, is a sign of insufficient standardization.

Actually, it is becoming increasingly common for virtual terminal platforms to allow the traditional palette of 16 colors to be customized. The Monokai color scheme, in particular, is gaining popularity in the software development community.

These serve a user need related to modern display technologies and different lighting environments, as well as personal taste, that is not met by requiring "blue" to be always "0,0,128", for example, or always "0,0,187" or whatever. Implementations that require fixed colors can use the CSI 38 mechanisms.

I think this fits with the overall theme of trying to codify first the existing standard, then the (reasonably) conformant existing implementations, then the existing extensions that are consistent with the intended scope of the standard. De-novo inventions should be identified clearly so that peer review has the option of saying "yes" to everything else but "no" to the inventions.

-- Doug Ewell

egmontkob commented 1 year ago

Kent Karlsson wrote:

Hi!

Thank you for your comments. I will go through them, work on the document(s?) and respond to some of the comments. (You are welcome to post more comments. As well as references to more existing implementations and their documentation.)

It will take a bit of time. This is a spare time project for me for now. (More time for this during my vacation...)

Kind regards /Kent K

PS Asmus refers to the WhatWG project. That was a HUGE project, IIRC. This project should be significantly smaller...

egmontkob commented 1 year ago

Kent Karlsson wrote:

Hi!

Non-goals: [...] hyperlinks

Are you aware that some terminals support hyperlinks? See https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda . You could easily adopt a simpler version (with the omission of the "id" parameter).

To follow the ECMA-48 standard, that should have been like

APC ......... ST

In any case that goes outside of what would be suitable in an SGR command, IMHO. But I can remove the word "hyperlinks" from the list you refer to. Or just say that it is out of scope for that document (which, apart from some suggested deprecations, is all about SGR).

Kind regards /Kent K

egmontkob commented 1 year ago

Asmus Freytag wrote:

On 5/23/2019 3:53 PM, Kent Karlsson wrote:

PS Asmus refers to the WhatWG project. That was a HUGE project, IIRC. This project should be significantly smaller...

Style of approach, not size, was the point of comparison. A./

egmontkob commented 1 year ago

Kent Karlsson wrote:

It is reasonable to limit the length of these escape/control sequences to 35 characters.

I don't think it is. And why 35, in particular?

An simple RGB sequence for modifying the foreground, plus its delimiter, e.g. "38:5::255:255:255;" is already 18 characters. This

I guess you mean 38:3 or 38:4. 38:5 just has one index parameter (and I missed out on that one in the doc.).

doesn't even include color-space and other parameters. An escape sequence can change the foreground, background, underline and whatever future colors plus all kinds of other attributes at once. Just setting all the properties listed in your document might easily become 100+ characters.

Not all ECMA-48 control sequences with multiple parameters are like this, but for SGR, one can just split the list on the ";"s (NOT the ":"s). E.g.:

CSI a;b;c;dm is the same as CSI am CSI bm CSI cm CSI dm

(spaces to be skipped)

But ok, I'm not keen on that limit. It came up because someone (in the earlier discussion on the Unicode list) complained about there not being one.


(\u001B\u005B|\u009B)

xterm intentionally doesn't accept the C1 variant (U+009B) in UTF-8, see rationale at the very beginning of https://invisible-island.net/xterm/ctlseqs/ctlseqs.html. Many other terminals also don't support it.

With your new document format, I think it's better to support C0 (e.g. U+001B U+005B) only, that way the document is more likely to work when cat'ed to a terminal.

In either case, you should probably state the rationale why you decided to allow or disallow C1.

[the rest of the regex]

Is the regex really this simple for ECMA-48? Or is this your simplified version? (I don't know.)

I did skip OSC, APC and such. Otherwise it should cover escape sequences and control sequences. But I might have made some mistake. I'll double-check.

(U+009B, preferred)

Oh, no no no no no, C1 asks for all kinds of troubles e.g. when the encoding of the text is unknown at one point. C0 is the one used in practice anywhere.

Hmm.

CSI 2m Lean font variant.

You change "faint" to "lean", which is IMO a nice thing to do (see link in my previous mail). But IMHO you should mention this fact and provide rationale as well. The concrete choice of sub-numbers should be shifted, according to my previous mail.

I'll get back to that email later (during the weekend or next week.)

CSI 56m Raised to first superscript level

Would be nice if the document stated where each such item is taken from (potentially altered or clarified), or if they are newly added.

Newly added.

Cancelled also by LF, CR, VT, NEL, LS, PS.

Why? Having to have different logic for tracking various attributes would be a PITA for implementations.

There are, already in ECMA-48, some control sequences whose effect are terminated by line ending. I guess such that are deemed not meaningful to continue after a line break. It is simpler to skip that complications, agreed.

CSI 59m Not raised/lowered and back to the set size.

Conflicts with underline color in some terminals.

Oh. Reference? Widespread?

CSI 67m Display uppercase letters as normal.

In order to save on the namespace, I would consider only allowing "66" or "66:1" to enable and "66:0" to disable; similarly for other newly introduced numbers. And in case of superscript/subscript, since they're mutually exclusive, it'll go like "56:1" for superscript, "56:2" for subscript and "56:0" to restore.

Hmm.

SMALL CAPS PROPORTIONAL WIDTH/LOWERCASE DIGITS FONT SIZE CHANGE ADVANCE MODIFICATION LINE SPACING

Are these really needed?

Well, is blue really needed? Small caps and digits variants (esp. lowercase digits, a.k.a. old-style digits) are quite common in texts (where possible to easily achieve).

The rest are really moves from other control sequences, into SGR (where I think they really belong, deprecating the old control seqs.).

Digits in otherwise lowercase running text should be typeset in lowercase as well, abbreviations (in uppercase) should be typeset in small caps when occuring in a sentence (written mostly in lowercase).

For font size and advanced: How are those units (millimeter etc.) to be interpreted in various contexts, e.g. monitor? What does "pixel" mean in case of HiDPI scaling?

You have the exact same problem for HTML/CSS: pt, mm, px, em are commonly used in CSS. I'm not sure about the exact interpretation there (obviously, zoom will scale everything; and it is not certain that 100% will be particularly exact when it comes to units). For backwards compatibility there are a few more units here, but there is no requirement to support all of them.

For line spacing. Typically a paragraph is wrapped to lines by replacing a continouos sequence of whitespaces (perhaps more than 1 of them) with a linebreak. Sometimes a linebreak occurs where there's no space character, just the corresponding Unicode algorithms allow a break. The amount of vertical spacing is taken from the corresponding parameter of which character exactly?

FOR CJK

I never knew what 60-65 were for, thanks for the clarification.

CSI 51m [...] but for a string rather than a single character

Is this taken from somewhere, or your addition?

They apply until cancelled, not just for the next (printable) character or so. They could apply to each character (or combining sequence) individually between start and end, but that seems strange and cluttered. Making the text string framed seems much more reasonable.

6) Text “decoration”

With the introduction of color underlines, kitty and gnome-terminal place the underline behind the letter, so that if they intersect it's the letter that is clearly visible. (Even further improved implementations might stop the underline a bit sooner so that it doesn't even touch the letter.) This behavior doesn't match your bottom-to-top order.

Ok.

t in 0--256 (0 means “inherit transparency of default background colour”, 1 is fully opaque, 256 is fully transparent).

This off-by-one sounds pretty dangerous, implementations will get it wrong; especially that it's for the background only and not for the foreground. Where does it come from?

I wanted an easy way to (by default) inherit the default (set by preference setting) background colour's transparency. Then, unless one explicitly sets the transparency, the default transparency affect all background colours. Not married to that approach. If you want to skip it or have another better mechanism, I'm all ears. Several terminal emulators and text editors offer to set a "global" background transparency, i.e. affecting "all" background colours (that currently cannot specify any transparency, but would be able to do that in my proposal; the default should still be "inherit").

The colours here are a compromise and should be used for new or updated implementations.

Not sure if your proposal is for the new document format only, or for terminal emulators as well. Many terminals offer various color palettes for the user to choose from, including traditional ones and some "special" ones (e.g. solarized) as well. You shouldn't try to recommend to stop this practice and offer a single fixed palette only.

This goes very much in the opposite direction of what was done for HTML/CSS, where the named colours (now quite a lot of them) are fixed in colour value.

Also when documentation (text or pictures) say "this and that will be orange", and actual implementations show teal, grey, or whatever colour, that invalidates the documentation and confuses users.

You might want to firmly disallow the palette colors from the document format.

Overstrike (strike-through) is used to mark for deletion

The feature and escape sequence originates from Kitty, which uses it for underline only (not strikethrough). gnome-terminal follows this behavior.

?? That seems to be in error... There is another control sequence for underline.

CSI 58:0m (the 0 can be omitted, CSI 58m) Reset text decoration colour to follow foreground colour.

Why did we pick 59 for resetting the underline color, rather than 58:0? I honestly can't recall. We should read back the threads. Probably for symmetry with the 38-39 and 48-49 ones. Anyway, it's probably too late to change.

Reference?

Text emphasis mark colour

Do we need this? Does it need to be distinct from the underline color?

HTML/CSS distinguishes them, IIRC. But I will reread that part of the CSS spec.

Is it okay to share the same color for CJK emphasis and framing lines, why don't these two have separate codes?

As above. I will reread that part of the CSS spec.

Have you considered runtime cost in terminal emulators implementing this feature? E.g. gnome-terminal currently has 64 bits for the color: 25 for foreground (true color + other special values), 25 for background and the remaining 14 for the underline (that is, the RGB value is approximated using 4+5+4 bits). Further increasing it could have a noticeable runtime penalty, and as such, a dealbreaker.

I haven't studied any imlementations in detail. Just made some tests of a few.

CSI 89m Cancel all shadows

Or 80:0m?

CSI 7m Negative image.

Maybe you can remove it from the list of supported features for the new document format.

Variants: CSI 5:0m sinusodial, CSI 5:1m VAVA-shaped, CSI 5:2m trapezoidal, CSI 5:3m rectangular.

I bet no one ever wants to bother implementing such modes. (I just recently implemented blinking in gnome-terminal for feature completeness among the "basic" ANSI/ECMA attributes and for fun, but I have to tell you, I came across more webpages stating that luckily gnome-terminal didn't support it than ones complaining about the lack of it.)

Also I'm wondering what's VAVA-shaped, e.g. when it reaches the top (the upper right of a letter V) does it suddenly jump to the bottom? Sorry, this textual description is not clear to me.

Those were more for fun. While HTML/CSS still specifies blinking, modern browsers don't imlement it at all.

Hope these help,

Very much. Thank you!

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi,

On Fri, May 24, 2019 at 11:00 PM Kent Karlsson wrote:

An simple RGB sequence for modifying the foreground, plus its delimiter, e.g. "38:5::255:255:255;" is already 18 characters. This

I guess you mean 38:3 or 38:4. 38:5 just has one index parameter (and I missed out on that one in the doc.).

Nice catch, I meant 38:2. (2 and 5 are the ones typically implemented in terminals, and I keep mixing these two.)

While at it, I haven't seen anyone care about :3 and :4.

Not all ECMA-48 control sequences with multiple parameters are like this, but for SGR, one can just split the list on the ";"s (NOT the ":"s). E.g.:

Indeed any sequence can be split at ";", and for that reason, I don't know why ";" exists at all rather than having to restart SGR. (Maybe saving 2 bytes was that important several decades ago?)

Anyway, the question is not whether there's an alternative solution. The question is why would you forbid an otherwise valid, well-established, documented format if the length exceedes a pretty easily reachable limit? Such a restriction will likely result in some applications that don't realize that they sometimes emit an overlong sequence, which causes trouble in some of implementations. A rare, hard to spot bug which then app developers will begin to work around by raising this limit, etc. If you really wish to, strictly disallowing semicolons makes much more sense to me. Or just raise the limit to 256 or so.

Cancelled also by LF, CR, VT, NEL, LS, PS.

Why? Having to have different logic for tracking various attributes would be a PITA for implementations.

There are, already in ECMA-48, some control sequences whose effect are terminated by line ending.

You mean some SGR sequences? If so, could you please point me to the particular section of ECMA-48 or whichever other standard.

CSI 59m Not raised/lowered and back to the set size.

Conflicts with underline color in some terminals.

Oh. Reference? Widespread?

Kitty's homepage, https://sw.kovidgoyal.net/kitty/protocol-extensions.html in particular. Implemented in at least Kitty and gnome-terminal (in fact VTE, and thus plenty of other terminals using VTE), see https://bugzilla.gnome.org/show_bug.cgi?id=721761 for the latter.

Feature requests for the same feature in various terminal emulators' bugtrackers, as well as probably vim/neovim bugreports. Random forum entries here and there, e.g. https://askubuntu.com/a/985386/398785.

There's no "official" or "officially looking" reference.

Not sure what you mean by widespread. Widespread use: probably not really yet. Widespread availability: yes. According to various polls on the web, and Debian's popcon, it seems to me that VTE's usage share is somewhere in the ballpark of 50% among users of some terminal emulator on Linux, so whatever VTE implements soon becomes available for many users :)

Let me throw back the question. How about your proposal? Is there any reference for that, or is it your brand new recommendation (accidentally matching the existing practice to some extent)?

SMALL CAPS [...]

Well, is blue really needed? Small caps and digits variants (esp. lowercase digits, a.k.a. old-style digits) are quite common in texts (where possible to easily achieve).

I guess it all depends on where you draw the line. Are small caps and digit variants really a common demand in a context with such limited capabilities (by its nature) as your new document format? Look at markdown for example, there are tons of logical formatting stuff like lists, and way less visual stuff than in the current SGR codes. And many people love markdown (I don't, but that's another story :)), it's spreading like cancer, apparently this is what people need. I'm really not convinced that there's a need for a format that's unable to have any kind of semantical logical structing (lists, tables, quoted text) and is probably not extendable in this direction by its nature, but supports this many visual elements.

The rest are really moves from other control sequences, into SGR (where I think they really belong, deprecating the old control seqs.).

I don't find it a wise idea to introduce new control sequences if an old one already exists. Or, at least, it has to be strongly justified.

Digits in otherwise lowercase running text should be typeset in lowercase as well, abbreviations (in uppercase) should be typeset in small caps when occuring in a sentence (written mostly in lowercase).

Again, I believe if someone cares about such mior typographical details, your format is way too limiting for them and they'll pick some more heavyweight, already existing format. (I'm slightly into typography, read some articles about it, e.g. (La)TeX typesetting tutorials, but I have to admit I can't recall ever hearing that it's desirable to switch to a different font for digits or abbreviations, nor seeing anyone do so.

[framed, encircled] They could apply to each character (or combining sequence) individually between start and end, but that seems strange and cluttered. Making the text string framed seems much more reasonable.

I'm not questioning whether it looks nicer. (They'd look as mouse highlight in the "terminology" terminal, without the shading, which indeed looks nice.) I'm curious if it's your opinion/interpretation, or a common opinion/interpretation (reference?), or a de jure statement (reference?).

t in 0--256 (0 means “inherit transparency of default background colour”, 1 is fully opaque, 256 is fully transparent).

This off-by-one sounds pretty dangerous, implementations will get it wrong; especially that it's for the background only and not for the foreground. Where does it come from?

I wanted an easy way to (by default) inherit the default (set by preference setting) background colour's transparency. Then, unless one explicitly sets the transparency, the default transparency affect all background colours.

I'm not against having an "inherit background" at all. I'm against encoding the values with an off-by-one. Plenty of implementations will get it wrong. Also PITA to manually "debug" or interpret a file that's in this format.

This goes very much in the opposite direction of what was done for HTML/CSS, where the named colours (now quite a lot of them) are fixed in colour value.

In HTML, indeed, the named colors are fixed values. In terminal implementations and in ECMA-48 they are not.

There's a fundamental difference: In HTML the names are just convenience aliases, and were never configurable, as far as I know. In terminals 256-color support is relatively new, and truecolor support is really new and limited to certain terminals only (even though a spec has existed for long). Many libraries and apps don't support truecolors, or not even 256 colors. And actually many people love being restricted to 16 colors and having one central place where their exact shades can be configured.

If you really dislike the palette approach, I think you should just forbid using these codes in your new file format.

Overstrike (strike-through) is used to mark for deletion

The feature and escape sequence originates from Kitty, which uses it for underline only (not strikethrough). gnome-terminal follows this behavior.

?? That seems to be in error... There is another control sequence for underline.

I probably didn't quote enough context, hence the misunderstanding.

Kitty designed and introduced 58/59 for the underline color, but deliberately chose not to apply this color for overline and strikethrough (https://github.com/kovidgoyal/kitty/issues/71). VTE followed this behavior.

(This is independent from the code for turning on/off underline/overline/strikethrough.)

CSI 58:0m (the 0 can be omitted, CSI 58m) Reset text decoration colour to follow foreground colour.

Why did we pick 59 for resetting the underline color, rather than 58:0? I honestly can't recall. We should read back the threads. Probably for symmetry with the 38-39 and 48-49 ones. Anyway, it's probably too late to change.

Reference?

See above. (By "threads" I meant public issues in Kitty's or VTE's bugtracker, or maybe some private conversations.)

cheers, egmont

egmontkob commented 1 year ago

Asmus Freytag wrote:

On 5/24/2019 3:28 PM, Egmont Koblinger wrote:

Well, is blue really needed? Small caps and digits variants (esp. lowercase digits, a.k.a. old-style digits) are quite common in texts (where possible to easily achieve).

I guess it all depends on where you draw the line. Are small caps and digit variants really a common demand in a context with such limited capabilities (by its nature) as your new document format? Look at markdown for example, there are tons of logical formatting stuff like lists, and way less visual stuff than in the current SGR codes. And many people love markdown (I don't, but that's another story :)), it's spreading like cancer, apparently this is what people need. I'm really not convinced that there's a need for a format that's unable to have any kind of semantical logical structing (lists, tables, quoted text) and is probably not extendable in this direction by its nature, but supports this many visual elements.

I regard this comment as rather important on the meta level, e.g. when it comes to the discussion that people have on the Unicode lists about "rich text in "plain text"". The terminal emulation environment is a (well-understood) subset, but with the rise of platforms from gitHub to Twitter where people type short texts into "edit fields", the demands for some ways of visually structuring this information short of a full document format will only grow.

A./

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi Kent,

Back to this particular point:

Cancelled also by LF, CR, VT, NEL, LS, PS.

Why? Having to have different logic for tracking various attributes would be a PITA for implementations.

There are, already in ECMA-48, some control sequences whose effect are terminated by line ending. I guess such that are deemed not meaningful to continue after a line break. It is simpler to skip that complications, agreed.

"less -R" preserves SGR attributes, so you can view colored outputs (such as git diff). However, it resets them at every newline.

This is done presumably not only because it's simpler to implement it this way, but also because it allows to efficiently handle giant files. For example, in order to jump to the end, it only needs to seek to the very end and walk back a little bit. It does not have to read and parse the entire file.

I don't know how "less" actually implements larger jumps (e.g. straight to the end of a giant file), but I know how Midnight Commander does because its viewer was rewritten by me. And it indeed jumps straight to the end and then walks back just a bit. We don't support SGR or any other form of coloring where a colored segment is marked by opening and closing characters (arbitrary coloring is blocked by the screen drawing libraries' capabilities), we only support bold and underlined using the in-place *roff notation (e.g. the "letter, backspace, letter" sequence for bold). However, it was an intentional decision to keep the viewer able to handle giant files, thus not parse the entire file, only look back from the start of the viewport to the preceding newline character in order to display the contents. So with its current design, we could only add SGR support the way it's in "less": resetting the attributes for each line of the input file. See https://midnight-commander.org/ticket/1849.

Keeping track of the colors across the file is okay as long as the viewer tool is expected to handle reasonably sized files. This is probably the case with web browsers, word processors, gedit-like graphical text editors. If you go to the terminal, you find that it begins to be a requirement to work on extremely large files, including disk devices. These tools were written keeping this goal in mind.

So how does this affect your proposal?

You might stick to what ECMA says: colors and attributes live on for the next line. Tools like "less", "mcview" are probably not going to implement it correctly: they will reset them at every newline character. It's more important for these tools to support giant files, wouldn't sacrifice or sevely degrade (e.g. slow down) this feature for the sake of coloring, nor would they bother implementing two different strategies (e.g. a slower but correct one for files below 10MB, and a faster but slightly approximating [or fully turning off color support] one for larger files).

Or you might want to say that all these attributes are reset on newlines. Mind you, then simply cat'ing your files to a terminal won't work as expected – and what's the point in picking SGR codes instead of something brand new if not this possible use case?

Or, could be a great compromise: If you define a new file format anyway, you might make it mandatory to explicitly reset to the default attributes by the end of each line. Just as you would declare a file "invalid" if there's an overlong SGR sequence or other unrecognized escape sequences etc., you would also declare a file invalid if one of its lines ends in any of the attributes being in nondefault state. This way cat'ing to the terminal and "less -R" will both work as expected (and so will "mcview" without heavy refactoring, if the screen drawing library lets us implement this feature one day).

(But whichever approach you pick, do it consistently for all the supported attributes.)

cheers, egmont

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi Kent,

If you want to make it a new standard file format, there's one more important thing to address.

What's the exact meaning of a newline character? How do you define lines, paragraphs of a text, and how are they denoted in the file format? E.g. when such a file is viewed in a designated viewer, should it wrap to the next visual line wherever there's an LF in the file? The other way around: if a WYSIWYG editor exports in this format, where does it need to emit single newlines or double newlines, should it keep a margin of 80-ish characters or not, etc.? Does the format have the concept of "hard line break inside a paragraph" (added by shift+enter in word processors), and if so, how is it denoted?

How is cat'ing such a file suppsed to appear, given that the file itself cannot know the width of the terminal? Should it fill the entire width, or be manually pre-fromatted to let's say 80 columns? Can you define the concept of newline, paragraph etc. in a way that's compatible with the use case of cat'ing the file (in a terminal whose width is unknown in advance)?

Or can you say that you don't define semantics such as paragraph in your file format, you only define visuals (LF for line break)? That is, kind of a plain text file augmented with text attributes, nothing more. Then what is expected to happen when you import/export such a file to/from a word processor that does have the concept of paragraph? Would this be specified, or left up for these tools to do something (presumably all of them somewhat differently)?

The world of text files is inconsistent. Some text files are written using arbitrarily long lines for semantical paragraphs, some insert a newline after every 80-ish characters. Some denote paragraphs by empty lines, some by indenting. Etc... Is it okay to keep this inconsistency, or is it desired to clean it up? If so, how?

Unlike with HTML where it's reasonable for a user to enter "..." and friends manually into a text editor, users won't be able to edit the "source" of such documents (you can't expect them to find a text editor which clearly shows the ESC character, offers easy means of entering it, and have them type e.g. "ESC[1m...ESC[22m" for bold). This file format will practically only be editable using designated tools. What kinds of tools do you have in mind? Plain text editors (Emacs, Vim, ...), word processors (LibreOffice Writer, ...), or newly written dedicated ones? How will the heavyweight word processors know how to read/save such files, that is, how to map newlines, empty lines etc. of the text file to/from their notion of paragraphs, hard line breaks etc.? How can you guarantee that such tools won't mess up the existing internal formatting, the existing behavior when cat'ing the file, etc.?

To give you an example:

The specification of the Markdown format defines paragraphs as being delimited by empty lines in the source text file. (By the way, it also defines hard line wrap inside a paragraph, denoted by two spaces at the end of a line.) It defines single newlines (without preceding double space) being equivalent to a space, allowing the source file to keep a narrow margin, without the formatted output also becoming narrow.

Then comes along a piece of software that calls itself "The best markdown editor for Linux and Windows", does this wrong (every newline of the input becomes a line break in the output), and ignores this bug for at least 1.5 years. (https://github.com/jamiemcg/Remarkable/issues/207)

Now imagine the mess that could arise for not even having a documented desired behavior...

Back to my previous mail:

The idea of having to restore the default attributes before every newline character might have a problem: the attributes are lost between the two words if they are only separated by newline(s). This includes attributes that are visible in case of a space character, such as background color or underline.

Therefore, this approach is only usable in either of these two cases:

- The new file format strictly defines visuals only, no semantics; to the extent that an editor shouldn't even be able to reformat existing text (e.g. to keep a margin of 80), or would do so by using heuristics for the attributes wherever a newline gets converted to a space. Sounds pretty ugly and fragile.

- The new file format defines the LF character to have the semantics of either a hard line wrap or a paragraph break. In this case it's okay not to have background color, underline attribute etc. for that because formatting won't convert it to a space.

cheers, egmont

egmontkob commented 1 year ago

Kent Karlsson wrote:

Den 2019-06-01 12:03, skrev "Egmont Koblinger":

Hi Kent,

Back to this particular point:

Cancelled also by LF, CR, VT, NEL, LS, PS.

Why? Having to have different logic for tracking various attributes would be a PITA for implementations.

There are, already in ECMA-48, some control sequences whose effect are terminated by line ending. I guess such that are deemed not meaningful to continue after a line break. It is simpler to skip that complications, agreed.

"less -R" preserves SGR attributes, so you can view colored outputs (such as git diff). However, it resets them at every newline.

This is done presumably not only because it's simpler to implement it this way, but also because it allows to efficiently handle giant files. For example, in order to jump to the end, it only needs to seek to the very end and walk back a little bit. It does not have to read and parse the entire file.

That may be. And I may agree that it may be preferable to auto-reset all (eq. to CSI 0m) at every hard newline.

Further, while possible to introduce "newline resets" for new SGR codes, it cannot be done for old ones, due to long-standing standard.

I removed the "newline resets" from my draft, as per your previous comment.

I don't know how "less" actually implements larger jumps (e.g. straight to the end of a giant file), but I know how Midnight Commander does because its viewer was rewritten by me. And it indeed jumps straight to the end and then walks back just a bit. We don't support SGR or any other form of coloring where a colored segment is marked by opening and closing characters (arbitrary coloring is blocked by the screen drawing libraries' capabilities), we only support bold and underlined using the in-place *roff notation (e.g. the "letter, backspace, letter" sequence for bold). However, it was an

nroff did that (outputting to a typewriter-like terminal). troff never did, it actually changed font.

(I haven't used them explicitly for a very long time. How they work may have changed since I last used them (explicitly).)

intentional decision to keep the viewer able to handle giant files, thus not parse the entire file, only look back from the start of the viewport to the preceding newline character in order to display the contents. So with its current design, we could only add SGR support the way it's in "less": resetting the attributes for each line of the input file. See https://midnight-commander.org/ticket/1849.

Keeping track of the colors across the file is okay as long as the viewer tool is expected to handle reasonably sized files. This is probably the case with web browsers, word processors, gedit-like graphical text editors. If you go to the terminal, you find that it begins to be a requirement to work on extremely large files, including disk devices. These tools were written keeping this goal in mind.

So how does this affect your proposal?

Well, for backwards compatibility reasons (going back decades), we cannot just start generally auto-resetting at newline (or similar). (Assuming that newlines occur fairly regularly in the file.)

Do you have a proposal? One could say something like "do not assume that the graphic state spans over newlines [of some sorts]; some programs may insert (explicitly or implicitly) CSI 0m just before each newline [of some sorts]."

You might stick to what ECMA says: colors and attributes live on for the next line. Tools like "less", "mcview" are probably not going to implement it correctly: they will reset them at every newline character. It's more important for these tools to support giant files, wouldn't sacrifice or sevely degrade (e.g. slow down) this feature for the sake of coloring, nor would they bother implementing two different strategies (e.g. a slower but correct one for files below 10MB, and a faster but slightly approximating [or fully turning off color support] one for larger files).

Or you might want to say that all these attributes are reset on newlines. Mind you, then simply cat'ing your files to a terminal won't work as expected ­ and what's the point in picking SGR codes instead of something brand new if not this possible use case?

"Something new" (operating on spans) would have the same problem.

Or, could be a great compromise: If you define a new file format anyway, you might make it mandatory to explicitly reset to the default attributes by the end of each line. Just as you would declare a file "invalid" if there's an overlong SGR sequence or other unrecognized

I never said/wrote that.

escape sequences etc., you would also declare a file invalid if one of its lines ends in any of the attributes being in nondefault state.

Declaring an entire file "invalid" is often a vary bad idea.

This way cat'ing to the terminal and "less -R" will both work as expected (and so will "mcview" without heavy refactoring, if the screen drawing library lets us implement this feature one day).

I'm not familiar with those.

Kind regards /Kent K

egmontkob commented 1 year ago

Kent Karlsson wrote:

Hi!

I have made some developments of the document re. proposed update to ECMA-48 SGR (and a few other) control sequences.

Comments appreciated!

Egmont, you are welcome to forward the document to the email list you mentioned (since it is an email list it may be that only subscribers can send, and even if that limitation is not set, I would not see responses sent to the list...). Maybe there is some interest, even though it's from a "random guy" and in a "fancy" document (not really, it is mostly "default MS word format", with only minor tweaks).

You are welcome to forward the document to others that you think may be interested. (Trying to slowly "widen the circles".) I'd appreciate a cc to me if you do so.

I still think that an actual update (or amendment) to these standards (via ECMA, ITU, ISO) is unlikely...

Kind regards /Kent Karlsson

SGR-update-B_c.pdf

egmontkob commented 1 year ago

Egmont Koblinger wrote:

Hi Kent, others,

My biggest concern hasn't changed. It's still unclear to me what your goal is, what contexts you expect this document to apply to, and most importantly, how/why you expect that some "random guy" who has no prior contributions of working source code or useful bugreports/comments to terminal emulators can come up with such a piece of work and get it accepted by others. Currently the world of terminal emulation and relevant escape sequences is shaped by those who actually create working code, ideally with some discussion and coordination among each other. In particular, this means that ideally each and every new feature should be thoroughly scrunitized and discussed, whereas with a big document like this it's easy to sneak in plenty of things that were not carefully evaluated.

Egmont, you are welcome to forward the document to the email list you mentioned (since it is an email list [...])

No, "Terminal WG" is not an e-mail list but a gitlab project where anyone can open issues. I'm not going to do anything for you, since I disagree with the overall approach you take. You can open one, but then expect a followup comment from me, along the lines of my comments here.

Some new comments, after very quickly skimming through this newest file:

"The following just “move” the display position [...] and should have no glyph lookup" – Some of these, e.g. CR, LF, BS, TAB are control characters that move the cursor, but do not change the characters they "jump over" (yes, TAB is one of these!), while others are space-like characters that do override the character they replace. Yet others are zero-width, that is, don't move the cursor, so it doesn't even make sense to talk about whether a glyph lookup takes place or not. It's really bad to wash up these three different categories into one. Also unclear to me how an unspecified "show invisibles" mode is related, and why it's a business of this specification at all whether a glyph lookup takes place or not. IMO the reasonable behavior for a terminal emulator is to do look up U+0020, U+00A0 and similar glyphs and show them, so that one possibility to get these displayed for debugging purposes is to specify such a font where these aren't empty. IMO the best would be just to take it ouf of this spec, and not handle space-like character in any special way compared to visible glyphs. They'll show up as empty with any sane font, no problem, nothing to fix here.

Bold and lean: I was playing with multiple font weights and variable width fonts recently in VTE, you can see side products of that at fontconfig bug 164 (fixed) and pango bug 370 (pending). Following the spirit of SGR 4:1, 4:2 etc., I find it a pretty bad concept if SGR 1:x and 2:x cancel each other, 1:x stands for the bolder weights and 2:x for the lighter ones. If it's desired to support more weights (which I do support), it should go with one common major number, and sub-modes to that. I haven't filed a proposal for this yet (although it's on my short-term TODO list), my recommendation would be to go for 1:1 .. 1:9 corresponding to CSS3 font weights 100 .. 900 (in steps of 100), respectively. Also, 1 would be an almost-alias to 1:7, 1:0 and 22 would be aliases to 1:4, and 2 could probably be an alias to 1:2 (but I'd leave this one up to terminal emulators). I'd also specify that for legacy reasons a terminal emulator might make certain colors brighter when bold is picked using 1, but not when bold is picked using the new 1:7. Anyway, the bottom line is that even such a tiny fraction of your document deserves giving it thorough thinking, throwing these a couple of ideas, being discussed among developers of terminals, and deciding on one.

56:=1m – not sure if the '=' sign can be allowed within SGRs without breaking stuff.

57 – merges two orthogonal features into one, let alone not even allowing all possible combinations.

60..63 – How are these different from the "normal" underline/overline? How to set "vertical writing direction" in terminal emulators? (Using SPD?) Is 60 the same as 62 with negated subparameter, and similarly 61 and 63? How to specify negative subparameter, is the '-' sign allowed in SGR (I don't think so)?

There's sure a lot more, but not for 1am, and not for me alone. IMO every little aspect should be reviewed thoroughly one by one by a wider group of terminal developers if you want to get somewhere.

cheers, egmont

egmontkob commented 1 year ago

Kent Karlsson wrote:

Den 2019-07-17 01:18, skrev "Egmont Koblinger":

Hi Kent, others,

My biggest concern hasn't changed. It's still unclear to me what your goal is, what contexts you expect this document to apply to, and most importantly, how/why you expect that some "random guy" who has no

Maybe that is just me...

prior contributions of working source code or useful

I do work on source code (and have for a long time), but not for terminal emulators... I do use (some) terminal emulators though...

bugreports/comments to terminal emulators can come up with such a piece of work and get it accepted by others. Currently the world of

For this I'm not thinking much about terminal emulators. More from a standards perspective, and that certain control sequences well apply to documents.

Remember that one (well two, but those are the same) of the standards I refer to is called "Open Document Architecture (ODA) and interchange format: Character content architectures." But I'm not aiming at ODA specifically, SGR/HSPA/PTX can be used in a document without the need for ODA as such.

terminal emulation and relevant escape sequences is shaped by those who actually create working code, ideally with some discussion and

The code I write usually works... ;-) (At least after a bit of testing...)

coordination among each other. In particular, this means that ideally each and every new feature should be thoroughly scrunitized and discussed, whereas with a big document like this it's easy to sneak in plenty of things that were not carefully evaluated.

Yes, ... but that comment isn't particularly helpful...

Egmont, you are welcome to forward the document to the email list you mentioned (since it is an email list [...])

No, "Terminal WG" is not an e-mail list but a gitlab project where anyone can open issues. I'm not going to do anything for you, since I disagree with the overall approach you take. You can open one, but then expect a followup comment from me, along the lines of my comments here.

Some new comments, after very quickly skimming through this newest file:

"The following just “move” the display position [...] and should have no glyph lookup" – Some of these, e.g. CR, LF, BS, TAB are control characters that move the cursor, but do not change the characters they "jump over" (yes, TAB is one of these!), while others are space-like

(BS often erases; or can (or rather could) be used in type-writer like terminals for overtyping; but I don't mention BS in my document.)

Well, regarding TAB (HT) in particular, HT is a common character in documents, and then is a (somewhat special) space. Typing and HT normally inserts that character in the document (and I'm sure you also know how it is displayed). CHT (and CBT), if at all interpreted, on the other hand moves the "active presentation position" (and do NOT work in a space-like manner), and those should not occur in a (modern) document. ("Documents" aimed at typewriter-like terminals, (yes, I have used those) not counted.) I know, ECMA-48 isn't the clearest formulated standard... Lots are left to the reader to interpret.

Yes, HT, CR and others, are often "specially interpreted" in a terminal context, and can depend on the kind of terminal one has (or had...), or which program is "taking in" the characters, and interpreting them as, say, commands. These characters are often dealt with differently in a terminal context than in a document context. (And I am thinking more about documents than terminals.)

ECMA-48 itself does not make a distinction of what control sequences are suitable in a document and which are not, which suit which kind of terminal etc. Indeed, it is aimed at "character-imaging devices" but what the authors had in mind was very much the technology of the day; not later inventions like "GUI"s, or "windows". Trying to fix, or modernise, THAT aspect of ECMA-48 would be a quite large project (basically revising the entire standard), and I am not willing to take that on. I don't really see it would be worth the effort. Just the small part that I have done, I find worthwhile.

characters that do override the character they replace. Yet others are zero-width, that is, don't move the cursor, so it doesn't even make sense to talk about whether a glyph lookup takes place or not. It's

Zero-width glyphs are not uncommon. Either for combining characters or control/format codes (zero-width spaces are formally format characters). In an (unspecified) "show invisibles" mode, zero-width spaces should get a zero-width glyph (usually not at all from the current font; and may have to be forced to be zero-width). I'm not sure it would be a problem for terminal emulators, but isn't for text editors (not terminal-based) that I can see.

really bad to wash up these three different categories into one. Also

Not sure it would improve anything to split that section up.

unclear to me how an unspecified "show invisibles" mode is related, and why it's a business of this specification at all whether a glyph lookup takes place or not. IMO the reasonable behavior for a terminal emulator is to do look up U+0020, U+00A0 and similar glyphs and show

(There are several more than those two.) When I worked a but more with fonts, I found that too many fonts either had unwanted glyphs there, or one got the .notdef glyph, which is also unwanted. Hence the advice not to look up spaces in the font. (Likewise for just about all C0, C1, and format control characters, "non-characters" and "default ignorable" (if ignored) characters.)

them, so that one possibility to get these displayed for debugging purposes is to specify such a font where these aren't empty. IMO the

"Show invisibles" should take glyphs from a special symbols font anyway, NOT from the current ("normal") font.

best would be just to take it ouf of this spec, and not handle space-like character in any special way compared to visible glyphs. They'll show up as empty with any sane font, no problem, nothing to fix here.

I still think there is. See above.

Bold and lean: I was playing with multiple font weights and variable width fonts recently in VTE, you can see side products of that at fontconfig bug 164 (fixed) and pango bug 370 (pending). Following the spirit of SGR 4:1, 4:2 etc.,

I agree with that one since it is apparently already implemented that way, AND it is a little bit nifty... (0 for nothing, 1 for single, 2 for double, 3 for squiggly ;-)

I find it a pretty bad concept if SGR 1:x and 2:x cancel each other, 1:x stands for the bolder weights and 2:x for the lighter ones. If it's desired to support more weights (which I do support), it should go with one common major number, and sub-modes to that. I haven't filed a proposal for this yet (although it's on my short-term TODO list), my recommendation would be to go for 1:1 .. 1:9 corresponding to CSS3 font weights 100 .. 900 (in steps of 100), respectively. Also, 1 would be an almost-alias to 1:7, 1:0 and 22 would be aliases to 1:4, and 2 could probably be an alias to 1:2 (but

But that is not a good idea. The numbers here are tied to Adobe's (OpenType's, or Multiple Masters) numbering scheme for this which I think is more qualitative than actually quantitative (though still usable for interpolation), AND not even typographers refer to this in that way. AND it would require "users" to know about this arbitrary numbering scheme.

CLDR has a, somewhat flawed (but that is a separate issue), naming for this that names discrete points in this (qualitative) axis as different degrees of leanness, and different degrees of boldness. And they do not go in even steps of 100 either. So no, I do not agree with your proposal here.

(B.t.w. 200 is extra-lean, 300 is lean, 350 is semilean, 400 is "regular", 600 is semibold, 700 is bold, 800 is extra-bold; but I still think these are more qualitative than quantitative.)

In addition, for terminal emulators, I have a hard time seeing that the extreme values along this axis are useful. So this would depart from your stand that extension to terminal emulators need come from a stated NEED...

I'd leave this one up to terminal emulators). I'd also specify that for legacy reasons a terminal emulator might make certain colors brighter when bold is picked using 1, but not when bold is picked

That particular one is something I'd like to see very much gone. It does not even make any sense when having a full colour scheme. I.e. "intensity" should be read as the "lean-bold" axis, NOT related to colour per se.

using the new 1:7. Anyway, the bottom line is that even such a tiny fraction of your document deserves giving it thorough thinking, throwing these a couple of ideas, being discussed among developers of terminals, and deciding on one.

56:=1m – not sure if the '=' sign can be allowed within SGRs without breaking stuff.

It should be ok, if you follow the ECMA-48 standard... = is reserved for future use in the fifth edition. "Bit combinations 03/12 to 03/15 are reserved for future standardization". (And the future is now... ;-)) If an implementation has missed that, I cannot help it.

57 – merges two orthogonal features into one, let alone not even allowing all possible combinations.

True, but not everything need be covered. But I will take that under consideration. (There is another axis where not all possibilities are available according to my proposal...)

60..63 – How are these different from the "normal" underline/overline?

Apparently different enough for those who wrote ECMA-48 to differentiate them. I don't see why "we" should try to force them together now.

How to set "vertical writing direction" in terminal emulators? (Using

Out of scope for my proposal. But could be SPD. Or perhaps better: a preference setting.

SPD?) Is 60 the same as 62 with negated subparameter, and similarly 61 and 63? How to specify negative subparameter, is the '-' sign allowed in SGR (I don't think so)?

There's sure a lot more, but not for 1am, and not for me alone. IMO every little aspect should be reviewed thoroughly one by one by a wider group of terminal developers if you want to get somewhere.

It is not aimed primarily at terminals, even though that is just about the only active implementations at this time; but several things should work well also for terminal emulators (and make them look more modern...). But some things are not well suited for terminal emulators, like HTSA, PTX and line indents (prop.: via SGR).

Kind regards /Kent K

egmontkob commented 1 year ago

...

and so this is where the conversation ended ~3.5 years ago.

Kent, your response to my request of publishing this thread also elaborated a bit about the context of this proposal. I'd leave it up for you to add those notes here as well, if you wish to.

Thanks everybody!

kent-karlsson commented 1 year ago

Please leave this as an archive and close the issue.