gwsw / less

Less - text pager
http://greenwoodsoftware.com/less
Other
543 stars 86 forks source link

less should support bracketed paste #523

Open vinc17fr opened 3 months ago

vinc17fr commented 3 months ago

less should support bracketed paste, that is, pasted text should not be interpreted as commands (generally a mistake).

gwsw commented 3 months ago

I want to clarify the details of this request.

The simplest enhancement is to simply ignore the brackets. So the sequences "ESC [ 200 ~" and "ESC [ 201 ~" would be ignored. Any text between the brackets would be processed as usual.

But I think you may be asking for something more when you say "pasted text should not be interpreted as commands" but I'm not sure what it is. I think that about the only time it's useful to paste text into less is when entering a search string. Are you asking that line editing commands be ignored in that mode? So typing "/abc" and then pasting the escape sequence for your terminal's left-arrow key would enter the escape sequence literally rather than moving the cursor left?

Also, there is no terminfo entry that defines paste brackets. So the brackets would need to be hardcoded (possibly problematic for some terminals), or specified by environment variables.

vinc17fr commented 3 months ago

Commands are the keys described under the COMMANDS section in the man page. Note that some of them, like ! or # (followed by a shell command) and s (to save the input to a file) are particularly dangerous with random text pasted by mistake. The qcharacter is also dangerous because it quits less and the remaining characters are sent to the shell without using bracketed paste. What to do depends on the context:

Also, there is no terminfo entry that defines paste brackets. So the brackets would need to be hardcoded (possibly problematic for some terminals), or specified by environment variables.

I think that you just need an option that enables or disables the support of bracketed paste. This is what the zsh and bash shells do.

The main terminals support bracketed paste: xterm, GNOME Terminal, rxvt, sakura, mlterm, Termux (under Android), iTerm2 (macOS), and others.

gwsw commented 3 months ago

Ok, I understand now that you are concerned with someone accidentally pasting into less, rather than deliberately pasting and getting unexpected results. Ignoring a bracketed paste has the risk that if an open bracket without a close bracket is somehow entered, all subsequent input will be ignored until a close bracket is entered. But perhaps there could be a timeout so that input would only be ignored for a reasonable period of time.

vinc17fr commented 3 months ago

It could also be deliberate pasting, but with contents that are not what the user expects, for various reasons such as bugs (Firefox does not always copy the selection to the PRIMARY selection) or hidden contents (e.g. from a web page due to CSS - various kinds of attacks).

gwsw commented 2 months ago

The --no-paste option was added in a8ae92e19d363c03d3f7d56e9b3c03d42bf493f6, currently in the post659 branch.

gwsw commented 2 months ago

@avih @adoxa Do either of you know anything about support for bracketed paste in Windows? I haven't been able to find any information about whether this is supported on Windows terminals. Experimentation seems to indicate that it doesn't work, even on a terminal that supports SGR.

avih commented 2 months ago

Support for bracketed paste depends on whether it's implemented at conhost.exe, which is the console backend for console applications (and to some degree also the frontend, e.g. for handling paste of the terminal window, I think).

As far as I know there's no way to detect support on Windows, and after minimal testing I think it's currently unsupported in both the native conhost in windows 10 19045, and in the latest OpenConsole.exe (the future windows conhost) which ships with the latest Windows terminal preview (1.21.240517002-preview).

I tested outside of "less", by printing \033[?2004h and then pasting, and it did not include the bracketed paste wrapper. But I would not 100% conclude that it's unsupported, because there might be some other factors involved.

There do exist 8 open issues and 50 closed ones for the windows terminal about bracketed paste - https://github.com/microsoft/terminal/issues?q=is%3Aissue+is%3Aopen+bracketed

So i would think that if it's not implemented yet, it probably would at some stage in the future.

It's also possible that it's implemented in some 3rd patty terminal emulator for windows.

However, does that matter?

Can you clarify what "support bracketed paste" means in "less" other than possibly special-handling of the content between \e[200~ and \e[201~ (and possibly sending \e[?2004h on startup and \e[?2004l on exit)? I did read the manpage changes at that commit.

Because it it only does the startup/exit thing (to enable/disable it), and doing something special with the wraped content, then I think it would "work" on Windows too, in the sense that if bracketed paste is suppoeted by the terminal then it's fine, and if not then the start/exit sequences would be ignored by the terminal, and the wrap sequences won't exist so there's nothing special to do about it anyway (same behavior as if bracketed paste was not enabled).

Or am I missing something? for instance in terminals which don't support it, can --enable-paste result in some wrong behavior? (other than behaving as if --enable-paste was not set)

avih commented 2 months ago

I tested outside of "less", by printing \033[?2004h and then pasting, and it did not include the bracketed paste wrapper. But I would not 100% conclude that it's unsupported, because there might be some other factors involved.

Not sure why the above didn't work as expected when bracketed paste is supported, but I think it's actually supported.

TL;DR: I think it works in Windows terminal (and its openconsole) but currently not in the native windows 10 conhost. I didn't try windows 11 conhost.

(the windows terminal openconsole is the future conhost of windows, so windows conhost is expected to follow the new openconsole and support it at some stage).

At least when using (windows) ssh into a linux system, where the shell is bash, and bracketed-paste is enabled via ~/.inputrc (the readline config file), then bracketed paste does work when using windows terminal 1.20 (and later I guess, and possibly earlier too), or OpenConsole.exe which ships with windows terminal. "working" means that when pasting multiple lines then bash displays it, but doesn't execute the lines, so that only an additional enter runs these lines.

However, It doesn't work when using the native windows 10 (19045) conhost (same ssh and bash on linux, ).

When disabling bracketed-paste in bash via ~/.inputrc, then it's indeed without bracketed paste, and all the terminals (windows terminal, openconsole, conhost) behave the same and pasting multiple lines results in each line being executed right away in bash.

I did try to check if --enable-paste works in the post659 branch, but the BP initialization is disabled on windows as far as I can tell, and I didn't try to enable it, so it doesn't work currently, but I think it should be possible to make it work if the init/deinit sequences would be sent.

avih commented 2 months ago

TL;DR: I think it works in Windows terminal (and its openconsole) but currently not in the native windows 10 conhost

Confirmed on windows terminal and openconsole with a minimal test program which sets input and output vt mode, then printing \033[?2004h, then using ReadConsoleInputW (or fgetc for buffered input), and pasted input indeed includes the bracketed paste wrapper.

Setting vt mode for the input console handle (ENABLE_VIRTUAL_TERMINAL_INPUT) seems to be required for this to work, but currently "less" explicitly unsets this bit, so the init sequence would need to change for this to work.

As expected from the ssh experiment, on native windows 10 conhost bracketed paste doesn't currently work, but will likely work in the future (unless win10 EOL arrives first). WIndows 11 probably does or will support it.

adoxa commented 2 months ago

Windows 11 (10.0.22621.3085) ConHost also does not.

vinc17fr commented 2 months ago

I've tested the post659 branch, and the only issue I've found is after typing :. For instance, if I paste qqq after that, this quits less, and I get qq~ at the shell prompt.

gwsw commented 2 months ago

What would you expect here? Do you expect less to drain all input before exiting?

avih commented 2 months ago

I'd think that if bracketed paste start is detected, to drain the whole bracketed paste content and its termination.

This is one thing which bracketed paste can help with, because we know it was pasted as a single unit, rather than (without BP) processing the input as it comes, quitting to the shell, and letting the shell (highly likely incorrectly) keep processing the paste.

Additionally, if "less" quits before it digests the BP end sequence, then the next application (probably shell) will get a BP end sequence which it never saw starting (and even if the terminal wanted to cancel the BP content after "less" disabled BP on quit, it might not be able to, because it might already be buffered by libc/OS and outside of the terminal control).

Unrelated, I don't know if there are conventions about the application aborting the wait for the BP termination, but I'd think there should be a timeout, probably of less than 1s (in "less" case). For instance, if the BP end sequence never arrives, does "less" hang forever? Or if the terminal doesn't support it, but --enable-paste is used, and then the user pastes the BP start sequence, but without a following end sequence? etc.

gwsw commented 2 months ago

Yeah, I thought of that argument. It seems reasonable to try to read the whole bracketed sequence, but we can't get stuck unable to exit if an unmatched open bracket is received. So there would have to be a timeout to guard against that. I don't think less currently has a way to drain tty input without reading it, except in the Windows build.

That relates to your last question -- currently after an open bracket is received, less ignores input for a maximum of 5 seconds or until the close bracket is received (defined by MAX_PASTE_IGNORE_SEC).

avih commented 2 months ago

I'd think that if bracketed paste start is detected, to drain the whole bracketed paste content and its termination.

I don't think less currently has a way to drain tty input without reading it

Maybe I used the wrong terminology, but I meant reading up to and including the BP end sequence (while ignoring whatever "less" wants to ignore, like everything after a quit command, or everything starting at the 1st newline, whichever comes first), but not beyond the BP end sequence.

Is there a semantic or functional difference in this case between "read" and "drain"?

If the content includes a quit command, and the implementation doesn't buffer the effective BP content (up to ignored content) before applying it, then indeed it might require adding some implementation details of reading the rest of the BP content if we identify a quit command before it ends.

A different implementation approach could be to always process the whole BP content up to and including the end sequence, while buffering up to the first newline (excluding), and only once the BP is complete, start processing it. This approach won't require special handling of quit inside BP, because less already drained the whole BP anyway by the time quit is applied.

But also, I think the fact that the rest of the paste goes to the shell is not the real bug, but rather only a side effect of the bug.

The real bug, I think, is that the BP content is interpreted as a command in the first place, because both the initial request and the docs suggest otherwise:

request:

pasted text should not be interpreted as commands (generally a mistake).

docs:

any text pasted into less is ignored, except that one line of text may be pasted into the command line at the bottom of the screen (search strings, file names, etc).

So if less quits after pasting qqq in a place where only search string or file name etc can be entered, then that's the real bug because the paste was interpreted as a command.

Or maybe "less" crashed (hard to tell from the issue report, and I didn't try it).

Either way, a paste which results in exit to the shell is the bug which needs fixing, and then there shouldn't be an issue of draining the BP before quitting.

gwsw commented 2 months ago

What I was getting at by calling it "drain" is that in this scenario less will be trying to read input which may not exist. If less just naively reads data until it receives the end bracket, but no input appears (because the start bracket was entered manually or whatever), less will just hang on the read. We would need something like poll() or select() to implement a timeout on the read. Less currently uses poll for another feature so maybe some of that code can be repurposed.

But I think you're correct that the pasted 'q' should not be handled as a command. If you just paste in a 'q' it will be correctly ignored, but if you type a ':' and then paste the 'q', the 'q' is treated as part of a "command line" and so executed, since ":q" is an alias for "q". This is a side effect of how multicharacter commands are all processed similarly. I think only a few multicharacter commands should actually allow paste; probably just examine, search, shell, and toggle-option.

avih commented 2 months ago

Yeah. So I think that while the request and the docs looked to me suggesting the same thing (paste should not be applied as command), they're actually different, and the docs don't actually say that it's never applied as command, because the implementation doesn't actually enforce that.

I think a better doc (and matching implementation) would be something like:

With --enable-paste, "less" ignores pasted content, unless it's pasted when less expects the user to type a file name, or search expression etc. In such case, the paste is inserted into the text field. If the paste includes one or more newlines, then the first newline and everything which follows is ignored. Specifically, "less" would not execute any part of pasted content as a command.

I didn't check the code, but there might also be a need to still filter the paste even if it's never executes as a command, for instance if it includes control chars like tab - which might trigger completion, or ^C char, or editing key sequences, or escape sequences in general, etc.

So maybe the solution is to ignore the paste starting at the first control character, if there is one.

vinc17fr commented 2 months ago

I think that if less is waiting for a command, it should entirely ignore pasted contents, but if it is waiting for a string, it should honor pasted contents, being careful with non-printable characters (for instance: ignore the first non-printable characters and all the following characters from the pasted contents).

So, just after :, pasted contents should be ignored. But after :e, since less is waiting for a filename (thus a string), pasted contents should be honored.

gwsw commented 2 months ago

Yes, I think that is the desired behavior. Unfortunately it is turning out to be trickier than one might expect, because the a multicharacter command like ":q" and the bracket sequences are parsed by the same code. So essentially in this case you have one command (bracket start) embedded inside another one (:q), and the command parser cannot handle one command embedded inside another. I just discovered that the same issue exists for mouse sequences: if you type : and then roll the mouse wheel, you get unexpected behavior because the mouse wheel sequence is interpreted as part of the command that starts with :.

avih commented 2 months ago

At least for mouse input (wheel and clicks?), while it would be nice if it worked while in command mode or search (after : or /), I think it's acceptable if the mouse is disabled in command mode (and anywhere else where its input would be hard to process), and resumed once this mode is done.

I have no insight about the bracketed paste implementation details.

However, eventhough I do like bracketed paste and use it elsewhere and is probably useful in "less", I would think that if this ends up requiring significant effort or refactoring then maybe it's not worth it. But at this time I can't judge the specifics.

gwsw commented 2 months ago

Disabling the mouse during entry of a multicharacter command might work. Unfortunately we can't use a similar technique of disabling bracketed paste during command entry, since doing so would just allow the pasted content to come through. I will consider other options. The easiest option is to just leave it as it is currently implemented, which does allow pasted commands to be entered in some cases. It's better than the released version which doesn't disallow pasted commands at all, and it seems to me that it takes a fairly deliberate attempt to do something weird to cause a problem (why would someone type : and then paste content?) The fact that the colon-followed-by-mouse-wheel bug has apparently stood unnoticed for close to 6 years indicates that such user behavior can't be common.

vinc17fr commented 2 months ago

Bracketed paste should be ignored both before : and after :. So I don't understand the issue.

gwsw commented 2 months ago

Yes, I agree that it SHOULD be ignored in both cases. The issue is that it is tricky to do that given the current implementation of command parsing.

There is a list of commands with an action associated with each one. Part of that list looks something like

"q" -> QUIT
":q" -> QUIT
":e" -> EXAMINE
"\e[200~" -> START_PASTE
"\e[201~" -> END_PASTE

When command characters are received, they are looked up in this list. A single-char command like "q" is easy; it matches the first entry in the list. For a two char command like ":q", when the : is received, we look it up and see that it doesn't match any command, but matches the beginning of a command (actually two of them), so we remember it and read another char. If the next char is "q" then we have a match for ":q". If it's "e" then we have a match for ":e". But if it's \e (the first char of a start bracket), the sequence we've received (":\e") doesn't match anything in the table. To handle this, we would somehow need to remember that we have a partial command (:) and now we have a second partial command (\e) embedded in it. Or perhaps, since embedded commands don't really make sense for anything except paste brackets and mouse reporting, the handling of paste brackets (and mouse reporting?) could be moved out of the command processor and handled in a completely different way. Either way it seems like a pretty non-trivial amount of work.

vinc17fr commented 2 months ago

I see. The code is really ugly, and even broken (without even considering bracketed paste support). For instance, type :, then the [Home] key. The escape sequence for [Home] is \eOH in xterm. It seems that what the code does is to ignore \e and O, then H is interpreted as the command H, which displays a summary of the commands. Instead, the escape sequence \eOH should have been regarded as a single key (this is how curses applications behave).

So, what you first need would be to recognize multicharacter keys in all contexts. Then it would be much easier to recognize the start of a paste.

gwsw commented 2 months ago

Yeah, you're right. The code is pretty poorly designed. It's one of the oldest parts of the code, written some 40 years ago when I had been programming professionally for less than 5 years. It could stand to be redesigned and rewritten.

vinc17fr commented 2 months ago

OK. The good thing is that if the user starts a multicharacter command, he will probably enter the second character of the command immediately after the first one, so that it is a bit unlikely that he will paste anything after :.

That said, I wonder whether paste-after-: could easily be ignored by adding

        ':',ESC,'[','2','0','0','~',0,  A_START_PASTE,

in cmdtable[] from decode.c. I've tried, and this appears to work: it is just interpreted as a multicharacter command that does nothing, just like a paste without : first.

avih commented 2 months ago

I wonder whether paste-after-: could easily be ignored by adding...

Cute.

However, I'm guessing it also leaves command mode?

If yes, I guess it should be acceptable, but maybe there could be some trickery to use a different command, like A_START_PASTE_IN_COMMAND, which is identical to A_START_PASTE, but after the paste end bracket, generates a keypress of : so that effectively the paste is ignored and it stays waiting for a command in the : mode? :)

vinc17fr commented 2 months ago

However, I'm guessing it also leaves command mode?

I'm not sure what you mean by that. This cancels the :, returning to the default prompt. I think that this is the expected behavior, because pasting after : does not make any sense. Actually, going to

                default:
                        bell();
                        break;

would be better, IMHO, in order to signal that something wrong was done, just as if one had typed a key that does not correspond to any command. So...

If yes, I guess it should be acceptable, but maybe there could be some trickery to use a different command, like A_START_PASTE_IN_COMMAND, which is identical to A_START_PASTE, but after the paste end bracket, generates a keypress of : so that effectively the paste is ignored and it stays waiting for a command in the : mode? :)

or put case A_START_PASTE_IN_COMMAND: just before default: (same as not having any case A_START_PASTE_IN_COMMAND:, but it may be better to be explicit on how it is handled) to do what I've suggested just above.

BTW, I would also add

        ESC,ESC,'[','2','0','0','~',0,  A_START_PASTE,

or something similar, because ESC is also a command prefix, and ditto for CONTROL('X').

On this subject, I can see that zsh doesn't handle prefix+paste correctly.

gwsw commented 2 months ago

It's a clever approach and is obviously much easier than rewriting the command processor. However it doesn't solve the similar cases mentioned above of mouse movement after : and other keys like HOME, END, arrows, etc. after :. These could be handled by adding yet more special case entries to the table but it seems like this could get unwieldy. If we want to be complete, there are 11 special key entries in the table plus 2 for mouse movement and the new paste entry, each of which would have to be duplicated to come after a :, ESC, ^X, ^O and Z. That's at least 14*5 = 70 new entries. And there's another similar class of issues if the user manually types the prefix for a special key, like \eO (which makes less think a HOME key sequence is in progress), and then pastes or mouse-scrolls or presses a real special key. If we handled those in a similar way it would add hundreds of more entries. And I suppose a similar thing would need to be done for the edit table.

gwsw commented 2 months ago

On this subject, I can see that zsh doesn't handle prefix+paste correctly.

Apparently neither does bash. Do

$ set -o vi
$ set enable-bracketed-paste

then type ESC followed by a paste and you get garbage.

vinc17fr commented 2 months ago

These could be handled by adding yet more special case entries to the table but it seems like this could get unwieldy.

Yes. However, they could probably be added automatically, either dynamically at the start of the program or statically (generation of C code) with a script.

gwsw commented 2 months ago

It probably would need to be done dynamically, to handle new cases created by commands defined in the user's lesskey file.

gwsw commented 2 months ago

The problem of misparsing a paste bracket, HOME, etc. when entered after a colon or other partial command should be fixed in d8aa003e20e1a8a2d2252ee01a91c7bd7e8ebe20.