gwsw / less

Less - text pager
http://greenwoodsoftware.com/less
Other
553 stars 87 forks source link

docs: Windows/DOS inaccuracies #481

Closed avih closed 6 months ago

avih commented 6 months ago

I checked the last commit 608c8e51 which touched the docs, and noticed some inaccuracies (not mecessarily regressions of this commit):

On MS-DOS and Windows, you don't need the quotes, but you should replace any percent signs in the options string by double percent signs

I don't know about DOS, but on Windows this doesn't seem to be true, but I also don't know how to escape percent. Here's what I tried (args is a C program which prints argv[1]..., each inside [...]):

d:\>set X=1

d:\>args %
[%]

d:\>args %%
[%%]

d:\>args %%X%
[%1]

d:\>args \%X%
[\1]

d:\>args ^%X%
[1]

All the above with added quotes, like args "%%" or args "\%X%" seem to be unaffected by the quotes and produce the results as if the quotes were not there.

On MS-DOS and Windows versions of 8-bit color is not supported; instead, decimal values are interpreted as 4-bit

Generally not true for Windows, as -Da exists, and works on win10 and later, which is probably what most windows users use these days.

It's probably true that it's unsupported by DOS, and not enabled by default on Windows.

On MS-DOS and Windows only, the -Da option may be used to specify strict parsing of ANSI color (SGR) sequences when the -R option is used.

I don't think -Da is supported by DOS?

And on Windows, I'd argue that it's unrelated to "strict parsing" as far as "less" goes. It simply doesn't try to interpret or apply the sequences internally, and instead lets the terminal parse and render escape sequences.

So it's actually less parsing and less strict.

Without this option (I presume -Da? - avih), sequences that change text attributes (bold, underline, etc.) may clear the text color.

That's no longer true since commit 8aec576 .

However, it does end up as 4 bits fg/bg, so some (stateless) mapping of the current attributes state has to take place.

Additionally, when the state is exactly default fg+bg and one attribute (only bold, only inverse, etc, and regardless of the steps which led to that state), then the relevant -Dx value is applied instead of the generic attributes mapping.

gwsw commented 6 months ago

Hm, the percent processing is confusing. Apparently the way percent handling works is different on the cmd line vs. in a bat file. %VAR% works in both contexts but double percent only works in a bat file:

C:\Users\markn>type t.bat
@echo one: %1
@echo two: %%1

C:\Users\markn>t.bat foo
one: foo
two: %1

I'll reword the man page to be more accurate.

I'm a bit confused about -Da and numeric colors on Windows right now. Testing seems to show that numeric colors work identically in Linux and in a Windows cmd window, even without -Da, which isn't what I expected. I need to look into this more. I'm not sure what happens in MS-DOS and don't know how to test MS-DOS anymore.

avih commented 6 months ago

Apparently the way percent handling works is different on the cmd line vs. in a bat file

Indeed, and I forgot about it. I would not too be surprised if it can't be escaped, but I just don't know.

I'm a bit confused about -Da and numeric colors on Windows right now

I might have looked at too small part of the text (I looked at that commit online at github and expanded a bit of the contexts).

I didn't realize 4-bit and 8-bit refer to "less" notation. I thought it refers to the ability of the terminal (passed through "less") to display "xterm-256" colors as part of the content, with the -R option, e.g. in:

printf "\033[38;5;123mfoo\033[m\n" | less -R

Anyway, indeed by default with -R, (without -Da, on windows earlier-than-10), such sequence seems to passthrough unmodified to the output without going through the SGR processor (which would have ignored 38, apply 5 as blink, and ignore 123).

It needs further examination, but it seems that once some sequences are encountered, the rest of the line doesn't go through the SGR processor. E.g.:

printf "\033[1m bold \033[46m cyan-bg \033[38;5;123m 256 \033[m xxx-normal \n normal \033[1m xxx-bold \033[m normal \033[1m bold \n" | less -R

results, on-screen, (windows XP), as:

 bold  cyan-bg ←[38;5;123m 256 ←[m xxx-normal ←[m←]8;;←\
 normal  xxx-bold  normal  bold ←]8;;←\

less-sgr

where bold/cyan-bg/normal are printed correctly, the xterm-256 color is passed-through (and prints with cyan background) and the last \033[m at the first line is passed-trhough without being applied ("xxx-normal" also has cyan background).

The next line processes SGR correctly again.

Almost, because the 2nd line bold seem to remember the cyan background??! ("xxx-bold").

But after a reset and another bold, it does show correctly.

Note also the final \033]8;;\033\ at the end of both lines, which shouldn't be there. A regression of the OSC 8 thingy?

Anyway, when it does go through the SGR processor, then except xterm 256/true colors, the attributes state is maintained correctly like in any normal terminal, and applied to windows colors in a stateless mapping pass, except default fg+bg and one attribute, where one of the -Dx values is applied instead of the default mapping.

Specifically, setting an attribute should never reset the color.

avih commented 6 months ago

It needs further examination, but it seems that once some sequences are encountered, the rest of the line doesn't go through the SGR processor. E.g.:

OK, luckily, I keep my own SGR parser (win_flush()) from when I worked on it, and in my version it looks correctly:

less-sgr-avih

You can check my parser here: https://github.com/avih/less/commit/5e1e1b7bf77226cd6ddaccce9b6e70d608361fcd

It has two differences to current master:

The 256 color support is orthogonal to the incorrect passthrough in master.

So to me this suggests that the current parser at the SGR processor is broken, mainly that when it encounters some sequences then it seems to stop processing SGR till the end of the line.

avih commented 6 months ago

So to me this suggests that the current parser at the SGR processor is broken, mainly that when it encounters some sequences then it seems to stop processing SGR till the end of the line.

And as far as I can tell, it's not a regression of 8aec5764, because the commit prior to that (52b8e35d) is similarly broken and renders like this: less-sgr-old

So as far as I can tell it's an sgr processor/parser issue which existed for a long time, before the sgr processor improvements (and which remained after the improvements), and it can be fixed by rewriting win_flush, for instance based on my implementation (though it's probably a small fix regardless in master which doesn't actually need a full rewrite).

avih commented 6 months ago

So as far as I can tell it's an sgr processor/parser issue which existed for a long time, before the sgr processor improvements

I think the issue is that for values bigger than 49, like 123 at the example above, it aborts the SGR processing till the end of the line, instead of till the end of the sequence.

This diff (to master) makes it not abort:

diff --git a/output.c b/output.c
index 99280c0..b3b0f24 100644
--- a/output.c
+++ b/output.c
@@ -299,7 +299,6 @@ static void win_flush(void)
                    }

                    if (q == p ||
-                       code > 49 || code < 0 ||
                        (!is_ansi_end(*q) && *q != ';'))
                    {
                        p_next = q;

and renders like this: less-sgr-fixed

IMO either take this patch which doesn't abort at all (and the SGR applicator ignores unknown values anyway), or abort till the end of sequence rather than till the end of the line.

However, this will not fix the issue where \033[38;2;100;102;102m still ignores 38, and applies 2 as "dim", and then abort the sequence at 100,

What could fix the issue, however, is that update_sgr indicates whether the value is known or not, and win_flush would abort the sequence on the first unknown value, which would correctly ignore the whole true-color sequence above.

But the best fix would be to accept 256/true colors, like my version does. It doesn't need the fancy mapping to 4-bit which my version has, and can probably be done in a much simpler way, though my current implementation does work rather nicely.

EDIT: that line which checks > 49 is more than 20 years old... - c2709ff8 so it's possible that it's been this way ever since.

avih commented 6 months ago

What could fix the issue, however, is that update_sgr indicates whether the value is known or not, and win_flush would abort the sequence on the first unknown value, which would correctly ignore the whole true-color sequence above.

I've implemented this approach. Here's my patch (git am should work):

process but ignore 256/true colors sequences ```patch From 243dac5310875be55cc1cd29d48df2f83a7cd87f Mon Sep 17 00:00:00 2001 From: "Avi Halachmi (:avih)" Date: Tue, 20 Feb 2024 12:48:07 +0200 Subject: [PATCH] windows: SGR processor: stop sequence on unknown code When not using -Da (and always on Windows before 10), we process SGR sequences internally and apply them as Windows/DOS colors. Previously, for SGR codes of 50 or more (e.g. ^[[38;2;150;200;250m) the SGR processor considered it invalid sequence, and passed-through this sequence and the rest of the current buffer (typically line). This resulted in 256 or true color sequences passed to the terminal, and on terminals which don't support sequences natively (i.e. before Windows 10), this resulted in incorrect output - the sequence prints literally. This commit does two things: 1. Identify unknown codes exactly, rather than "code >= 50" (e.g. 38 is unknown, while 39 or 40 are known and supported). 2. On unknown code, don't apply more codes at this sequence. This fixes incorrect junk output for 256/true colors on Windows earlier than 10, e.g. as part of the content (with the -R option), or as part of 8-bit color transformation with the -Dx options. 256/true colors still don't apply (i.e. they're ignored), but now such sequences don't print junk. Note that there are still sequences which the SGR processor considers invalid, such as various OSC sequences, and more, and on these cases the old behavior of "passthrough till the end of buffer" still apply. --- output.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/output.c b/output.c index 99280c0..148b380 100644 --- a/output.c +++ b/output.c @@ -113,7 +113,8 @@ typedef struct t_sgr { static constant t_sgr SGR_DEFAULT; /* = {0} */ -static void update_sgr(t_sgr *sgr, long code) +/* returns 0 on success, non-0 on unknown SGR code */ +static int update_sgr(t_sgr *sgr, long code) { switch (code) { @@ -150,7 +151,11 @@ static void update_sgr(t_sgr *sgr, long code) case 44: case 45: case 46: case 47: sgr->bg = C_ANSI(code - 40); break; + default: + return 1; } + + return 0; } static void set_win_colors(t_sgr *sgr) @@ -250,6 +255,14 @@ static void win_flush(void) p = p_next; if (p[1] == '[') /* "ESC-[" sequence */ { + /* + * unknown SGR code ignores the rest of the seq, + * and allows ignoring sequences such as + * ^[[38;5;123m or ^[[38;2;5;6;7m + * (prior known codes at the same seq do apply) + */ + int bad_code = 0; + if (p > anchor) { /* @@ -299,16 +312,20 @@ static void win_flush(void) } if (q == p || - code > 49 || code < 0 || (!is_ansi_end(*q) && *q != ';')) { + /* + * can't parse. passthrough + * till the end of the buffer + */ p_next = q; break; } if (*q == ';') q++; - update_sgr(&sgr, code); + if (!bad_code) + bad_code = update_sgr(&sgr, code); p = q; } if (!is_ansi_end(*p) || p == p_next) -- 2.41.0.windows.2 ```
gwsw commented 6 months ago

I have integrated this patch in f9f2498f20fa2f8f6126006eb19e7e6fc157739b. What are the differences between this and your rewritten parser? I think the current parser was mostly written by you anyway, in 8aec5764, so if you have an improvement I would be open to integrating it.

avih commented 6 months ago

I have integrated this patch in f9f2498

Thanks. I see the commit title dissappeared (it was "windows: SGR processor: stop sequence on unknown code"). Next time I'll open a PR ;)

I'm a bit confused about -Da and numeric colors on Windows right now. Testing seems to show that numeric colors work identically in Linux and in a Windows cmd window, even without -Da, which isn't what I expected

So does this commit fix it for you, i.e. that without -Da numeric colors now no longer work on windows? Any other remaining issues regarding -Da?

Note also the final \033]8;;\033\ at the end of both lines, which shouldn't be there. A regression of the OSC 8 thingy?

Can you confirm this issue? It still exists on master for me (observable on windows prior to 10).

I think the current parser was mostly written by you anyway, in 8aec576,

The "flush" functionality does two things:

  1. "parser": Handle the input buffer, identify SGR sequences, extract the SGR values, handle sequence errors (currently win_flush).
  2. "applicator": Do something with the SGR values (i.e. apply them as windows colors) (currently sgr_update updates the state, and set_win_colors applies it as console colors).

Both were added together 20 years ago here c2709ff87b23b1da99ae46b785543fa085bbcbda .

Commit 8aec57642de8f13ebe3f5e09eec5cfecd7f35d91 refactored and reimplemented the "applicator" (sgrt_update and set_win_colors) outside of win_flush, while the parser remained effectively unmodified. The new code at this commit maintains an SGR state as if it was a virtual terminal, and then map this state into windows colors, while the older code maintained the state as the windows colors, which resulted in clashing and other issues which hopefully improved.

It was the parser part which considered codes > 49 as invalid sequence and resulted in passthrough till the end of the input buffer. This existed already 20 years ago.

What are the differences between this and your rewritten parser?

  1. Mine ignores CSI sequences which are not SGR, and assumes that ESC which is non-CSI ends one char later (not always correct, e.g. with OSC sequences), and never aborts parsing, while the existing code gives up parsing and passthrough till the end of the buffer in various cases (till today that also included SGR values bigger than 49).
  2. Mine extracts all the values of a single SGR sequence into an array, and applies the whole array at once, while the existing code applies each SGR value as it's encountered, so it's harder for the existing code to implement dependency between the numbers (e.g. that "38;5;124" is a 256 colors sequence and not 3 unrelated values).
  3. Mine always consumes the whole input buffer, and knows to resume at the next call from the exact place the buffer ended previously, even if it was in the middle of an SGR value inside a sequence, while the existing code leaves incomplete sequences at the buffer, hoping the next call would complete the sequence, and hoping that there's enough room left at the buffer for the rest of the sequence. This theoretically can cause the current code to get into infinite loop, if a sequence is longer than the buffer (untested).

And my version is smaller, and subjectively I think also more readable, but I'm biased...

Existing `win_flush` ```c static void win_flush(void) { if (ctldisp != OPT_ONPLUS || (vt_enabled && sgr_mode)) WIN32textout(obuf, ptr_diff(ob, obuf)); else { /* * Digest text, apply embedded SGR sequences as Windows-colors. * By default - when -Da ("SGR mode") is unset - also apply * translation of -D command-line options (at set_win_colors) */ char *anchor, *p, *p_next; static t_sgr sgr; for (anchor = p_next = obuf; (p_next = memchr(p_next, ESC, ob - p_next)) != NULL; ) { p = p_next; if (p[1] == '[') /* "ESC-[" sequence */ { /* * unknown SGR code ignores the rest of the seq, * and allows ignoring sequences such as * ^[[38;5;123m or ^[[38;2;5;6;7m * (prior known codes at the same seq do apply) */ int bad_code = 0; if (p > anchor) { /* * If some chars seen since * the last escape sequence, * write them out to the screen. */ WIN32textout(anchor, ptr_diff(p, anchor)); anchor = p; } p += 2; /* Skip the "ESC-[" */ if (is_ansi_end(*p)) { /* * Handle null escape sequence * "ESC[m" as if it was "ESC[0m" */ p++; anchor = p_next = p; update_sgr(&sgr, 0); set_win_colors(&sgr); continue; } p_next = p; /* * Parse and apply SGR values to the SGR state * based on the escape sequence. */ while (!is_ansi_end(*p)) { char *q; long code = strtol(p, &q, 10); if (*q == '\0') { /* * Incomplete sequence. * Leave it unprocessed * in the buffer. */ size_t slop = ptr_diff(q, anchor); /* {{ strcpy args overlap! }} */ strcpy(obuf, anchor); ob = &obuf[slop]; return; } if (q == p || (!is_ansi_end(*q) && *q != ';')) { /* * can't parse. passthrough * till the end of the buffer */ p_next = q; break; } if (*q == ';') q++; if (!bad_code) bad_code = update_sgr(&sgr, code); p = q; } if (!is_ansi_end(*p) || p == p_next) break; set_win_colors(&sgr); p_next = anchor = p + 1; } else p_next++; } /* Output what's left in the buffer. */ WIN32textout(anchor, ptr_diff(ob, anchor)); } ob = obuf; } ```
My `win_flush` ```c /* we maintain a parsing state, and the sgr state of our virtual terminal. * specifically, we don't maintain a state of the current windows colors. * once an SGR sequence completes, we apply the sgr state as windows olors. * if the buffer ends inside a sequence, we'll continue from the exact place */ static void win_flush(void) { static sgr_state sgr; static int parse_state; /* 0: outside of esc 1: ESC detected 2: in CSI */ static unsigned v; /* currently parsed sgr value */ static unsigned char sgr_vals[64]; static int sgr_len; char *p = obuf, *oend = ob; ob = obuf; /* we always consume the whole buffer */ if (ctldisp != OPT_ONPLUS || (vt_enabled && sgr_mode)) { WIN32textout(obuf, oend - obuf); return; } while (p != oend) { switch (parse_state) { case 0: { char *pesc = memchr(p, ESC, oend - p); if (!pesc) { WIN32textout(p, oend - p); return; } if (pesc > p) WIN32textout(p, pesc - p); p = pesc + 1; parse_state = 1; continue; } case 1: /* if not CSI ('['), assume it's ESC- */ parse_state = *p == '[' ? 2 : 0; ++p; continue; case 2: /* we parse the CSI as if it was an SGR sequence, eventhough * we don't know that yet. we'll discard if it ends up not SGR. */ for (; p != oend; ++p) { if (*p >= '0' && *p <= '9') { v = v * 10 + *p - '0'; continue; } /* separator or end of CSI */ if (sgr_len < sizeof(sgr_vals)) /* ignore overflow vals */ sgr_vals[sgr_len++] = v; v = 0; if ((unsigned char)*p < 0x40) continue; /* the CSI loop */ /* end of CSI */ if (is_ansi_end(*p)) { /* which was SGR */ /* len >= 1 always, with value 0 if got nothing */ update_sgr(&sgr, sgr_vals, sgr_len); set_win_colors(&sgr); } sgr_len = 0; ++p; parse_state = 0; break; /* the CSI loop */ } } /* switch */ } } ```

so if you have an improvement I would be open to integrating it.

I don't mind opening a PR to to use my version of win_flush. So now that you hopefully have a bit more info, shall I?

gwsw commented 6 months ago

The extraneous OSC8 sequence at the end of a line containing SGR sequences should be fixed in 46f637c35eda990e4342043af130ad4e1c3d90e7.

Yes, go ahead and submit a PR with your win_flush changes.

Regarding windows color setting, I was apparently misremembering how win_set_color works. It currently just sets the color with SGR sequences when vt_enabled is set. I thought it was looking at sgr_mode too. And on a non-vt_enabled terminal it does nothing; I thought in that case it was interpreting the color number as a parameter to _settextcolor. It doesn't seem great to just ignore the color on non-vt terminals, but I'm not sure how much effort to put into handling non-vt terminals anymore; I guess it's no longer a common case?

avih commented 6 months ago

Yes, go ahead and submit a PR with your win_flush changes.

OK.

Regarding windows color setting

OK, I don't think I follow. You describe it in terms of how specific C functions do or don't behave, but I'm not sure how that should/expected-to translate to behavior of "less", or bugs thereof.

E.g. commit f9f2498f addresses an issue where in a non-vt terminal, 256 and true-color sequences were printed literally, which is wrong. The solution there is to ignore such sequences.

Another solution could be to translate 256/true color to 4 bit color. I already have such solution, and I can add it as an additional commit (you can observe it at same commit where I have my version of win_flush - which also has an enhanced update_sgr and friends which supports 256 and true colors - https://github.com/avih/less/commit/5e1e1b7bf77226cd6ddaccce9b6e70d608361fcd).

So:

  1. Can you describe your observation in terms of behavior of "less", which you initially thought is an issue? Preferably with specific invocation options so that I can reproduce it.
  2. Does this behavior change following f9f2498f ? Specifically, effectively 8-bit colors were working on win10 without -Da (because the SGR parser gave up on transslation and switched to passthrough mode), but now 8-bit colors sequences are explicitly ignored and have no effect on the output.
  3. Do you want also the 256/true color support translated to 4-bit on non-vt? (I can add it as a second commit after the win_flush commit).

I'm not sure how much effort to put into handling non-vt terminals anymore; I guess it's no longer a common case?

I think most MS users these days do use win10/11, but "less" runs fine on XP and later, and even supports DOS, and I happen to already have code to support translation of 256/true color to 4 bit non-vt.

Nevertheless, it is more code, even if not much more code. So up to you. I'm fine either way, but FYI such code exists already and I've been using it for more than 6 months - since I opened #383 .

gwsw commented 6 months ago

What I'm saying is, I thought that in some circumstances an option like -Dd31 would interpret the "31" as a CHAR_INFO.Attributes value (BACKGROUND_GREEN+FOREGROUND_BLUE). That is how the DOS version used to work years ago before all the color changes over the last 2 or 3 years. But that doesn't seem to be the case. I'm not sure when that changed or if it introduced an incompatibility in behavior (no one has complained about it). Currently, as far as I can see, numeric colors are interpreted as CSI 38;5 values in all cases, which is probably a good thing, but not what I had remembered.

I don't think what I'm seeing completely agrees with your point #2. If I run less -R, without -Da, I still see CSI 38;5 colors, running in a cmd window on Windows 10. For example, -DP32.227 shows the prompt in foreground blue / background yellow. Toggling -Da doesn't seem to change anything.

Re #3, I'd like to at least look at your changes to support 256 color on non-vt. The more we can make all environments act as similar as possible, the better.

avih commented 6 months ago

OK, there appears to be some issue with -Da - I don't think works at all.

As far as I can tell, -Da does set sgr_mode to 1 at opt_D, however, right after that, it's set back to 0 unconditionally at get_term.

So basically, at win_flush (current master or my version), sgr_mode is always 0, so even if vt_enabled is 1 - 256/true escape sequences never get to the terminal.

So this can either be a regression, or it's possible that it's never worked, but only seemed working due to the issue which was fixed recently where the parser switched to passthrough mode.

For me, this code at win_flush is never entered, even when using -Da and vt_enabled is true (win10):

    if (ctldisp != OPT_ONPLUS || (vt_enabled && sgr_mode))
        WIN32textout(obuf, ptr_diff(ob, obuf));
    ...

But now that it doesn't switch to passthrough mode, I can't get 256 colors working at all.

avih commented 6 months ago

As far as I can tell, -Da does set sgr_mode to 1 at opt_D, however, right after that, it's set back to 0 unconditionally at get_term.

Can you confirm this?

If yes, I prefer that you fix it, because init and options ordering can be a delicate thing, and I've not touched this code before.

avih commented 6 months ago

I just realized, that once -Da is fixed, it actually does different things depending if the terminal supports VT:

  1. Prior to win10, it controls whether the -D{d,s,k,u} translation are applied (yes by default, no in "sgr mode" - when using -Da).
  2. On win10+, whether to use the internal SGR processor and apply the -D{d,s,k,u} translations (by default), or do full passthrough and let the terminal handle escape sequences (-Da). That's because passthrough happens when vt_enabled && sgr_mode, and -Da enables sgr_mode, while vt_enabled is set unconditionally to 1 if the terminal supports VT.

As a result, on win10+, it's impossible to use the internal SGR processor while disabling the -D{d,s,k,u} translations.

I don't know whether that's by design, but that's what the code did and does (or will do once -Da is fixed).

Note that it's still possible to set -Da interactively, and this will work as described above, because this will change sgr_mode but will not reset it back to 0 because get_term is apparently not called again when setting -Da interactively.

avih commented 6 months ago

As far as I can tell, -Da does set sgr_mode to 1 at opt_D, however, right after that, it's set back to 0 unconditionally at get_term.

So this can either be a regression, or it's possible that -Da never worked

This is a recent regression of 09377f76 which moved get_term from before the options parsing to after it.

So it was ok in 644 and broken since v645.

For what it's worth, I don't think get_term should touch it. This is an option and state of "less" - not a state of the terminal.

gwsw commented 6 months ago

Sigh. The initialization ordering has always been very fragile. Every time something is changed there, something else breaks.

As far as I can see, there's no reason that get_term needs to initialize sgr_mode. I think we can just remove the line that sets sgr_mode = 0 in get_term.

avih commented 6 months ago

I think we can just remove the line that sets sgr_mode = 0 in get_term.

Same. Maybe set it explicitly to 0 where it's defined, but not sure. Just make it init like other options. Because it's an option.

I thought that in some circumstances an option like -Dd31 would interpret the "31" as a CHAR_INFO.Attributes value (BACKGROUND_GREEN+FOREGROUND_BLUE). That is how the DOS version used to work years ago before all the color changes over the last 2 or 3 years. But that doesn't seem to be the case

Right. Apparently on win10+, where vt_enabled is unconditionally true, it uses the 256 colors sequence for 8-bit colors, and as far as I can tell 8-bit color wouldn't work on windows xp/7/8, so I presume it translates to 4 bit someplace earlier:

https://github.com/gwsw/less/blob/46f637c35eda990e4342043af130ad4e1c3d90e7/screen.c#L2666-L2706

Apparently this bypasses the SGR processor at win_flush and writes the sequence directly to the terminal.

However, I'm not sure when it's actually used.

E.g. on Linux, -Ds201 sets "standout" (e.g. the status line) to purple-ish in 256 colors, but on windows (10 with vt enabled) that doesn't work.

However, setting the line number to the same color does work: --use-color -DN201.

I think that unlike linux, on windows, the -D{d,s,k,u} values are applied at win_flush and affect both the content and "less" pintouts, and the implementation is (when -Da is disabled) that it uses only the fg/bg colors, it doesn't use a 256 color number.

But the other types, like N for line numbers are applied elsewhere, before it reaches win_flush.

I think there's a potential here to unify the code paths of windows and elsewhere, but not sure yet. It's not easy to track this behavior, and the docs don't help...

avih commented 6 months ago

As far as I can tell, -Da does set sgr_mode to 1 at opt_D, however, right after that, it's set back to 0 unconditionally at get_term.

So this can either be a regression, or it's possible that -Da never worked

This is a recent regression of 09377f7 which moved get_term from before the options parsing to after it

I thought that in some circumstances an option like -Dd31 would interpret the "31" as a CHAR_INFO.Attributes value (BACKGROUND_GREEN+FOREGROUND_BLUE). That is how the DOS version used to work years ago before all the color changes over the last 2 or 3 years. But that doesn't seem to be the case ... Currently, as far as I can see, numeric colors are interpreted as CSI 38;5 values in all cases

When you say "But that doesn't seem to be the case", what did you mean?

If you mean "I tried -Ds31 and it doesn't have any effect at all", then it's true, and is due to the same commit, because e.g. -Ds31 does set so_fg_color=31 at opt_D, but then get_term is called later and reset all these values, so it has no effect at all, and all of -D{d,s,k,u} are also broken since that commit.

Do note that 31 is invalid as windows color, and is masked with 0x0f, which results in 15, which is bright white.

Before commit 09377f7 (which moved get_term to after the options parsing), -Ds31 indeed applied the standout color as bright white windows color (15).

If you mean "I looked at the code and it seems to be applied as 256 colors and not the windows CHAR_INFO.Attributes value", then that would also be true, but only for all the -D values which are NOT d/s/k/u, and that has no effect unless VT is suported (and unconditionally enabled).

And so here's what "sgr mode" really does:

It disabled the the -D{d,s,k,u} mapping only at the CONTENT (win_flush), because the -D{d,s,k,u} values do get applied also outside of win_flush. And it does so differently depending on vt_enabled (skip the d/s/k/u mapping at win_flush if no VT, skip the whole SGR processor and switch to passthrough with VT).

And that's been true for a long time - both before and after I touched it at commit 8aec5764 .

So, to untangle this mess, here's what I suggest:

  1. You unbreak -D{d,s,k,u} by ensuring that get_term doesn't reset so_fg_color and friends after opt_D is called.
  2. I'll modify win_flush parser as discussed earlier. This has the advantage that it always consumes the whole input buffer, so it doesn't need to keep some stuff for later calls (incomplete sequences). Thanks to this, we can change it from win_flush() (which takes out-of-band pointers obuf and ob, and modifieds the latter) to win_term_write(const char *s), which doesn't know or care about obuf/ob, and uses the SGR processor if needed. That would allow using it also when applying the color outside of the content as escape sequences using this function, and we can remove SETCOLOR and all the other places which set the color outside of (current) win_flush.
  3. Unify the nix/windows -D{d,s,k,u} color options value. Specifically, don't make -Ds15 use the windows color 15, and instead make it 256 colors 15, like on nix, and like e.g. -DN123 already does. This may or may not work without VT (we could look at it later). We're not losing any function by this, because all the windows values 0-15 can be set also with the color letter, including bright/bold, e.g. -Ds15 is the same as -DsW anyway. But we do gain uniformity with the non d/s/k/u value form, and we do allow 256 color replacement for d/s/k/u as much as possible. However, this would break backward compat, so need to decide if we care, and if yes, what to do about that.
  4. Decide what to do about "sgr mode". I think there is value in allowing the -D{d,s,k,u} mapping also at the content, even if it's unsupported on *nix, and it's good to have a way to disable it (-Da), but if that turns out too painful to maintain, IMO we can remove it.

Sounds like a plan?

avih commented 6 months ago

Here's is my suggested docs changes and additions (assuming -Da/d/s/k/u are fixed), which describe the current behavior as best as I know, as convoluted as it might be. I'll include it in a PR, but I'd appreciate preliminary comments, if any.

I think that even if simplify things and make it more like on unix as I suggested in my plan, it would still be good to document the current behavior as best as we can, before possibly replacing it (probably not before the next release, because it will be risky), so that we can have at least one release with correct windows colors docs.

(I care about the essence, I don't feel strongly about the wording at all, and I'm neither English nor groff expert)

Summary:

...

<existing>

    d      Bold text.

    k      Blinking text.

    s      Standout text.

    u      Underlined text.

<suggested addition>

    Note that the keys "d", "k", "s", and "u", just like the others,
    only map attributes set by less itself - not attributes  at  the
    content  when  using  the -R option. However, On MS-DOS and Win‐
    dows, these keys also map attributes at the  content.   See  the
    -Da option below.
...

<existing>

    An 8-bit color string is one or two decimal  integers  separated
    by a dot, where the first integer specifies the foreground color
    and the second specifies the background color.  Each integer  is
    a  value  between 0 and 255 inclusive which selects a "CSI 38;5"
    color value (see
    https://en.wikipedia.org/wiki/ANSI_escape_code#SGR)  If   either
    integer  is  a "-" or is omitted, the corresponding color is set
    to that of normal text.

<current, before modification>

    ... On MS-DOS versions of less, 8-bit color
    is  not  supported;  instead,  decimal values are interpreted as
    4-bit CHAR_INFO.Attributes values (see
    https://docs.microsoft.com/en-us/windows/console/char-info-str).

<suggested replacement>

    On MS-DOS and Windows versions of less, 8-bit color is not  sup‐
    ported for the keys "d", "s", k", and "u"; instead, decimal val‐
    ues are interpreted as 4-bit CHAR_INFO.Attributes values (see
    https://docs.microsoft.com/en-us/windows/console/char-info-str).
    Additionally  When  using 8-bit color with other keys, like "N",
    it only has an effect on Windows 10 and later.

...

<existing>

    On MS-DOS only, the -Da option may be  used  to  specify  strict
    parsing  of  ANSI  color  (SGR)  sequences when the -R option is
    used.  Without this  option,  sequences  that  change  text  at‐
    tributes (bold, underline, etc.) may clear the text color.

<suggested replacement as a new "-Da" section>

-Da
      On  MS-DOS  and  Windows only, when using the -R option, the op‐
      tions -Dd, -Ds, -Dk, and -Du also affect the content.  -Da  dis‐
      ables  this effect on the content (however, these mappings still
      apply to attributes which less itself sets).

      Note that on Windows 10 and later, when using the -R option, -Da
      also  disables  the  internal  processing of ANSI sequences into
      colors, and instead pass the sequences directly to the  terminal
      -  which  can  process them. Because the internal ANSI processor
      produces only 4-bit colors, a side effect of using -Da  together
      with -R is that 8-bit colors and true colors at the content also
      display correctly.
actual diff ```diff diff --git a/less.nro.VER b/less.nro.VER index 3148e79..9b8aae4 100644 --- a/less.nro.VER +++ b/less.nro.VER @@ -733,7 +733,16 @@ Standout text. Underlined text. .RE +.PP .RS +Note that the keys "d", "k", "s", and "u", just like the others, only +map attributes set by +.BR less +itself - not attributes at the content when using the \-R option. However, +On MS-DOS and Windows, these keys also map attributes at the content. +See the \-Da option below. + +.PP The uppercase letters and digits can be used only when the \-\-use-color option is enabled. When text color is specified by both an uppercase letter and a lowercase letter, the uppercase letter takes precedence. @@ -790,14 +799,17 @@ https://en.wikipedia.org/wiki/ANSI_escape_code#SGR) .hy If either integer is a "-" or is omitted, the corresponding color is set to that of normal text. +.PP On MS-DOS and Windows versions of .BR less , -8-bit color is not supported; instead, decimal values are interpreted as 4-bit -CHAR_INFO.Attributes values +8-bit color is not supported for the keys "d", "s", k", and "u"; instead, +decimal values are interpreted as 4-bit CHAR_INFO.Attributes values (see .br .nh -https://docs.microsoft.com/en-us/windows/console/char-info-str). +https://docs.microsoft.com/en-us/windows/console/char-info-str). Additionally +When using 8-bit color with other keys, like "N", it only has an effect +on Windows 10 and later. .hy .PP A 4-bit or 8-bit color string may be followed by one or more of the @@ -812,10 +824,21 @@ Underline .IP "&" Blink .PP -On MS-DOS and Windows only, the \-Da option may be used to specify strict parsing of -ANSI color (SGR) sequences when the \-R option is used. -Without this option, sequences that change text attributes -(bold, underline, etc.) may clear the text color. +.RE +.IP "\-D\fIa\fP" +.RS +On MS-DOS and Windows only, when using the \-R option, the options \-Dd, \-Ds, +\-Dk, and \-Du also affect the content. \-Da disables this effect on the +content (however, these mappings still apply to attributes which +.BR less +itself sets). +.PP +Note that on Windows 10 and later, when using the \-R option, \-Da also +disables the internal processing of ANSI sequences into colors, and instead +pass the sequences directly to the terminal - which can process them. Because +the internal ANSI processor produces only 4-bit colors, a side effect of +using \-Da together with \-R is that 8-bit colors and true colors at the +content also display correctly. .RE .IP "\-e or \-\-quit-at-eof" Causes ```
gwsw commented 6 months ago

When you say "But that doesn't seem to be the case", what did you mean?

I meant both of the things you suggested: I tested it (but only with uppercase letters so I didn't see the different behavior of the lowercase letters), and I looked at the code in win_setcolor (so again missed the fact that lowercase letters are handled differently via the xx[fb]g_color variables).

I think I'm ok with changing the interpretation of color numbers to be SGR 38;5 values in all cases rather than CHAR_INFO.Attribute values for lowercase letters. The current behavior seems confusing and inconsistent (even you and I who have worked on the code recently didn't understand the behavior without studying the code again). It does break compatibility which I'm not happy about (and normally try to avoid at all costs), but I think the gain in consistency is worth the inconvenience that some users may have to change their configuration once.

gwsw commented 6 months ago

I have opened a new issue #482 to track the change in interpreting color numbers since it is not really related to this issue, which began as a discussion about the wording in the man page.

avih commented 6 months ago

began as a discussion about the wording in the man page.

Well, before your comment it also ended with updated wording for the man page :)

Looks generally OK?

gwsw commented 6 months ago

I think the man page changes are generally ok, but I want to get the code changes done first. After the code is finalized we can then document its behavior.

polluks commented 6 months ago

By the way https://github.com/gwsw/less/blob/448962e844d9f875695fa5676aca82a62e731439/less.nro.VER#L2368 should be %HOME%/_lesshst instead?

gwsw commented 6 months ago

I have changed the man page to better describe the --color behavior in MS-DOS/Windows in ad1c9b81c0760f21ced604289e7c2b1941a5c7e7.

avih commented 6 months ago

This fails to mention what -Da does in simple words. This is exactly that it does:

Specifically:

And, if my version of the docs was correct (I think it is, do corect me if I'm wrong), then I don't think these items reflects in your version:

Summary:

  • Clarify: -D{d/s/k/u} don't affect the content, except dos/windows.
  • 8-bit colors with the -D options are unsupported only for d/s/k/u.
  • 8-bit colors with non-d/s/k/u only work on win10+.
  • -Da disables the effect of -D{d/s/k/u} on the content.
  • -Da On win10+ also passthrough SGR, enabling 256/true colors for -R.
gwsw commented 6 months ago

I'll make another pass at describing your first and fourth point more clearly. But it's not accurate to say that "-D{d/s/k/u} don't affect the content" because -Dd and -Du do affect overstruck content text on non-dos. I think your other points are already covered by my changes:

  • 8-bit colors with the -D options are unsupported only for d/s/k/u.
             •      When a numeric color value is given with a lowercase col‐
                     or  selector letter, the number is not interpreted a "CSI
                     38;5" color value as described  above,  but  as  a  4-bit
                     CHAR_INFO.Attributes  value  between  0  and 15 inclusive
  • 8-bit colors with non-d/s/k/u only work on win10+.
             •      When  a  numeric  color  value is given with an uppercase
                     color selector letter, numeric color  values  are  inter‐
                     preted  as  "CSI 38;5" values, but this is supported only
                     on Windows 10 and later.
  • -Da On win10+ also passthrough SGR, enabling 256/true colors for -R.
    •      The -Da option makes the behavior of --color more similar
    to  its behavior on non-MS-DOS/Windows systems by (1) in‐
    terpreting numeric color values with lowercase color  se‐
    lectors  as  "CSI 38;5" color values (but only on Windows
    10 and later),
avih commented 6 months ago

-Dd and -Du do affect overstruck content text on non-dos

Right. Thanks. I wasn't aware of that. For reference:

E.g. green d and red u at the content on both linux and windows:

printf "d\010d_\010u\n" | LESS= less -Ddg -Dur

And this is now documented correctly, maybe except for what "overstrike" bold/underline is.

But what is missing is the difference by default from *nix - that on dos/win, when using -R, -D{d,s,k,u} affect SGR bold/inverse/blink/underline at the content, while on linux it doesn't.

And what -Da does is disable this additional effect on the content.

  • -Da On win10+ also passthrough SGR, enabling 256/true colors for -R.
             •      The -Da option makes the behavior of --color more similar
                     to  its behavior on non-MS-DOS/Windows systems by (1) in‐
                     terpreting numeric color values with lowercase color  se‐
                     lectors  as  "CSI 38;5" color values (but only on Windows
                     10 and later),

This is actually strictly incorrect.

-Da effect has zero relation to using --color and the lower (or upper) case letters. --color and lower case letters mapping. E.g. -Dd123 works by interpreting what would-be 8-bit color on linux (123), as 4-bit windows-color, regardless of -Da. Here that would be 123 % 16 == 11, which is bright-cyan in windows 4-bit color, like in my suggested docs:

On MS-DOS and Windows versions of less, 8-bit color is not sup‐ ported for the keys "d", "s", k", and "u"; instead, decimal val‐ ues are interpreted as 4-bit CHAR_INFO.Attributes values (see https://docs.microsoft.com/en-us/windows/console/char-info-str).

The additional thing that -Da does (beyond disabling the additional SGR attr mapping at the content with -R), and which is also only relevant with -R, is allow {3,4}8;5;N, and {3,4}8;2;R;G;B (256/true colors SGR) at the content to display correctly, because with -Da these are passed-through to the terminal, while without -Da it goes through the internal SGR processor, which currently doesn't render them correctly.

E.g. this:

printf "\033[38;5;100m 256 colors\n" | LESS= less -R

Doesn't work without -Da, because the SGR processor currently doesn't render these correctly (they're ignored). But -Da skips the SGR processor and passthrough the sequence to the terminal, so it displays correctly with -R -Da.

So as far as I can tell, these are still not documented, or partially so:

avih commented 6 months ago

So what do we do abvout this?

Is it a difference in understanding how these things behave? (-Da, 8-bit colors with upper-case -DX letters) or do we understand the same behavior but only differ in how we think it should be documented?

At least as far as -Da goes, on a technical level it's very simple: it disables the application of -D{d/s/k/u} on SGR attributes at win_flush - which seems to affect only the content when using -R. I.e. -D<anything> mapping is applied outside of win_flush for "less" generated output (line numbers, prompt, etc), so that's unaffected by -Da. And because on win10+ it also bypasses the SGR processor at win_flush, it also allows 256/true color at the content with -R.

gwsw commented 6 months ago

I have been busy with some stuff IRL, but I should have an update to the man page in a few days.

avih commented 6 months ago

Docs look good to me now. Thanks.

Two nits:

On MS-DOS and Windows, the --color option behaves differently from what is described above in these ways:

  • The bold (d and *) and blinking (l and &) text attributes are not supported.

I believe the bold/blink comment above refers only to a color suffix to use an additional attribute (e.g. -DsWd)? because it could be also interpreted as if one can't map bold or blink to something else, i.e. that -DdW and -DkW are not supported (but they are).

... 8-bit "CSI 38;5" true color value ...

Every time the manpage mentions "CSI 38;5" (only at the dos/windows section?) it also mentions it's "true color", however, it's not generally referred to as true color.

38;5 is followed by single a 0-255 value, and is typically referred to as either "256 colors" or "xterm 256 colors" (which I believe is where it originated. I don't think the DEC terminals had 256 colors).

When people speak of "true color" at the terminal, they typically refer to 24-bit color RGB, with the CSI sequence 38;2;R;G;B (where R/G/B are 0-255 each, and also with 48 instead of 38 for 24-bit BG color)

In my comments above where I mentioned 256/true color, I meant both 256 colors and true color sequences (only work well at the content with -R if also using -Da).

gwsw commented 6 months ago

I believe the bold/blink comment above refers only to a color suffix to use an additional attribute (e.g. -DsWd)?

Yes that is correct. I tried to make that clear with the parenthetical comments, but I'll reword it a little more clearly.

however, it's not generally referred to as true color.

Thanks for catching that, I was confused about that terminology. I'll fix that too.

avih commented 6 months ago

Change looks good. Thanks.

avih commented 6 months ago

Well, maybe one or two things are still missing, which were also never documented as far as I can tell, and that's the default value of the -D{d/s/k/u} mappings - which are different than on *nix, and maybe when they're applied, because that's not necessarily obvious IMO.

I'm leaving it to your judgment whether these should be documented.

Specifically:

The others (bold, standout) are now like on *nix (before b9318187 (v643 and earlier) the mapping were colors all over the place).

Also, the -D{d/s/k/u} values are applied to content SGR attributes (when using -R without -Da) only when the FG and BG are the default color, and exactly one relevant attribute is set (before 8aec576 (v643 and earlier), they were applied after any SGR sequence of one value which changed an attribute).

gwsw commented 6 months ago

I don't think it's necessary to document the default colors. There are 15 more default colors for the uppercase color selectors which are also not documented, and for that matter, the exact places where bold and standout are used when --use-color is off are also undocumented. I don't see much value for the user to describe all these implementation details in the man page.

I will take a look at your other points and see if anything should be changed.

avih commented 6 months ago

There are 15 more default colors for the uppercase color selectors which are also not documented

That's true, but at least it does mention "colored text in various places" (at the --use-color docs, which is when the upper case letters defaults matter).

But also the upper case letters mappings are not enabled by default. One has to add --use-color in order to see them, while on windows underline and blink have these colorful mapping by default (which. by the way, I think is useful).

However, it's probably also true that most users don't have underline or blink at the content, and only those two have slightly unexpected default mapping. However, the quick help page does have underline, so the default underline mapping does appear there.

But I agree that documenting these defaults is not critical at all, and same for the other part of where the -D d/s/k/u mappings are applied to SGR attributes, because it does work in most cases exactly as expected, I would think.

Observation unrelated to the docs:

It actually quite surprised me that when trying to use these upper case letter for better resolution of the mapping, one has to use --use-color, which suddenly changes various colors by itself, even before adding manual mappings with the upper case letters.

So it's actually hard to use the upper case letters e.g. to only change the line numbers color, because using only -DNG doesn't work, but doing --use-color -DNG works but also changes the prompt color (to whatever the default is for -DP, which is not inverse like when not using --use-color) and likely other elements too, besides the line numbers.

I would think the default value of the upper case letters should have been that it looks as if --use-color was not used, so that it's possible to change only one of the upper case letters without changing all the places "less" can use colors.

Or maybe I have expected the upper case letter to work also without --use-color, and that --use-color would have only changed the default values of the upper case letters to more colorful defaults.

gwsw commented 6 months ago

The basic rule is we don't send escape sequences to the terminal unless they are specified in the termcap entry, or the user explicitly opts-in by giving the --use-color option. There's no guarantee that SGR sequences will work on any arbitrary terminal.

DOS/Windows is of course different, in that colors aren't formed by escape sequences, and ordinary bold/standout/etc isn't supported at all (on non VT). For many decades we've used colors on DOS to replace the missing bold/standout formatting. So the behavior on DOS/Windows is somewhat weird, because colors are being used for two essentially different things: to replace bold/standout formatting, and to implement the --use-color behavior like other systems.

avih commented 6 months ago

Yes, all of that is true, and I was aware of it.

Though I don't see the relation or hindrance to my suggestions:

  1. Document the default (windows) mapping, specifically only underline and blink are non obvious.
  2. Or maybe I have expected the upper case letters to work also without --use-color, and that --use-color would have only changed the default values of the upper case letters to more colorful defaults.