gwsw / less

Less - text pager
http://greenwoodsoftware.com/less
Other
529 stars 85 forks source link

OSC 8 on Windows not working well by default #522

Open avih opened 1 month ago

avih commented 1 month ago

Currently, by default, the internal SGR processor on Windows does not identify OSC 8 correctly, and renders incorrect text (shows both the URL and text, etc) on Windows prior to 10.

On windows 10+ it does work:

Adding OSC-8 support to the internal SGR processor is not impossible, and it might be a good solution, however, since "less" already knows to parse OSC-8, I was wondering if it could be translated to underline the link text on terminals/setups which can't handle OSC -8.

One such case would be Windows if VT is not enabled or if -Da is not used[*]. Other cases could be controlled by some option to indicate that the terminal doesn't support it (I presume there's no terminfo flag for OSC-8 support?).

I didn't yet look at the code, but wanted to hear your thoughts about this.

[*] including the case where -Da is not enabled, because passthrough-to-the-end-of-the-buffer can include SGR sequences after the OSC-8 - which the internal SGR processor would not track during passthrough, which can result in rendering bugs.

avih commented 1 month ago

I just added OSC support in my new SGR parser, which ignores ("swallows") all OSC sequences, except OSC-8 where it can add underline to the link text (and the URI is hidden/ignored as part of the OSC sequence).

It's not too many lines of code (adds 3 new states to the parser), and it doesn't make the parser more complex, so it's probably best to leave it to the SGR processor on windows, and not handle it specially elsewhere in "less" (beyond the existing special handling).

As a side note, I know we said we'll add the new windows SGR parser after the release, but it's been months since then, and we still don't have a release, so if you want, I can open a PR to enhance the SGR processor with the following:

I've been using this new parser (sans the new OSC[-8] states) for probably about a year now, and did not notice issues with it, so let me know if you want me to open a PR.

gwsw commented 1 month ago

Sure, go ahead and file a PR for the new SGR processor. I'm trying to get the beta closed but #515 and #521 have delayed that. I will release 657 as a beta today and if no new issues arise I expect to do the production release in a week or two.

I have thought a little about what to do with terminals that don't support OSC8. You're correct that there's no terminfo flag for this. Previously when only SGR (ESC...m) sequences were supported, the -R flag was sufficient to enable/disable the pass thru. Now that OSC8 is supported, perhaps there should be another flag or an environment variable for finer control. That would go in the next release of course.

avih commented 1 month ago

Sure, go ahead and file a PR for the new SGR processor. I'm trying to get the beta closed but #515 and #521 have delayed that.

You mean you'll try to merge it before the release? or file it and it can be handled soon - after the release?

Now that OSC8 is supported, perhaps there should be another flag or an environment variable for finer control.

Not sure about that. I think even terminals which don't support it still know to ignore OSC sequences which they don't identify?

AFAIK, the generic form is ESC ] NUM; ... and either ESC \ or BEL, which can be detected and ignored even without knowing what OSC-NUM is supposed to do.

At which case, it would look like plain text in such terminals (and in "less"). I think that's by design of how OSC-8 works (maybe not the greatest design, because the URI is lost in such case inside the ignored OSC-sequence, and leaves only the link-text visible).

It might be OK to underline it in such case (or use some other customizable attribute with -D<x>), but I don't think it's necessary. If the source file has OSC-8 links but the terminal doesn't support it, then looking like plain text is fine IMO.

But, if you feel like it, then maybe allow also replacing it with some attribute, but IMO not by default.

Also, in my new OSC handling, I decided to simplify it and do one of two things:

This simplifies the state machine from detecting OSC and also OSC-8 (and whether or not it sets the link or resets it) to only detect OSC sequences.

gwsw commented 1 month ago

You mean you'll try to merge it before the release? or file it and it can be handled soon - after the release?

No, I think that would be too big a change for the current release. I was thinking it would be merged after the release. I'm currently experimenting with some changes that will go in after the release so having a PR with the code to examine would be useful now.

Not sure about that. I think even terminals which don't support it still know to ignore OSC sequences which they don't identify?

Yes, I think most terminals ignore OSC sequences they don't support. I was thinking there may be some terminals which don't do that because they're very old or too rudimentary (like the current SGR support in less!) but perhaps it's not really an issue.

BTW, ECMA-48 (section 8.3.86) says an OSC sequence is simply "ESC ]", terminated by "ESC \". AFAICT, the "NUM ;" part that comes after the ] in many OSCs is an extension of the spec. The use of BEL instead of ESC\ to terminate is also an extension of the spec.

avih commented 1 month ago

I think that would be too big a change for the current release. I was thinking it would be merged after the release. I'm currently experimenting with some changes that will go in after the release so having a PR with the code to examine would be useful now.

OK. You can see a near final version here and the OSC support added here, just replacing the current code instead of disabling it with #if 0.

BTW, ECMA-48 (section 8.3.86) says an OSC sequence is simply "ESC ]", terminated by "ESC "

Right. It's 8.3.89 and the termination is ST, which is ESC \ (I think the markdown swallowed the \). I knew that BEL termination is an extension (which we should support), but I wasn't sure whether NUM; is mandatory or just a common convention.

Either way, in my code it's detected by ESC ] (ignoring the C1 value), and ends in ESC \ or BEL.

avih commented 1 month ago

FYI, v657 is not tagged. I think you typically tag versions?

gwsw commented 1 month ago

Thanks. I did make a tag when I built the release, but sometimes I forget that git push doesn't push tags for some reason that I don't understand. I have to do git push --tag.

avih commented 1 month ago

By the way, I see this in version.c:

v657  5/31/24   Fix buffer overrun when using LESSOPEN

If that refers to #521 then unless you know better, I don't think it's only with LESSOPEN.

It's a buffer overflow in fexpand, which I think doesn't depend on LESSOPEN,

It probably only overflow if the malloc block size is the same as the file name sans the \0. I noticed it with names of 8 chars, like Makefile or output.c.

The implication depends on what gets overwritten, and it can be basically anything, depending on allocation order etc.

I only noticed it on windows with LESSOPEN, but I'm sure it can result in other bugs under different circumstances.

Also, it's a relatively recent regression, so it probably doesn't affect versions more than a month or so old.

gwsw commented 3 weeks ago

I've merged output.c from avih/less to the post659 branch where I'm currently doing development (change 22ac9f0d1dfc9163a2ca94da19265d23dd96d79f). Is that the only file that's needed? I've done minimal testing so far. Let me know if you see any problems.

avih commented 3 weeks ago

I didn't notice any problems, and that's the final logical functionality I intend to have with this change.

But I have locally some refinements and changes that I didn't yet push: mainly using a more scientific RGB to 16 colors conversion, which works better.

In this case I wanted to PR a set of 3-4 incremental commits, each explaining what and why is changed.

So when work resumes at the main branch, can you skip 22ac9f0d, and instead use the PR (which I intent to open in the next few days)?

Once we get that merged, which means win_flush can handle any OSC, CSI or SGR sequences (by ignoring and/or passthrough and/or custom application of the sequence) including 256 and true-colors support on all terminals, the next step would be thinking how to reduce the differences in behavior and possibly code too, between *nix and windows, i.e. reduce the docs section "On MS-DOS and Windows, the --color option behaves differently" as much as possible.

This means first to remove all the difference between win10 and other dos/windows to to a single statement "on DOS and windows before 10, 256 and true colors are converted to 16 colors". This should be trivial.

And then decide which differences between *nix and windows we want to keep - which I think mainly comes down to whether the custom content rendering of underline should stay, which personally I still find useful.

Another useful thing to do later is improving the default behavior on VT terminals, i.e. supporting 256 and true colors with -R by default (currently the default - without -Da, is to convert to 16 colors), which would be trivial (like -Da passthrough), except if we want to still keep the custom content color for underline, which would require some further thinking (because it can't be a full passthrough anymore, and we'd need to track the SGR state and occasionally interfrere even when it's mostly passthrough).