microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!
MIT License
95.11k stars 8.25k forks source link

terminal/parser: determine what to do when split escape sequence writes show up #4037

Open fabian-z opened 4 years ago

fabian-z commented 4 years ago

Environment

Windows build number: Version 10.0.18363.535
(reproducible with OpenConsole from master)

Windows Terminal version (if applicable): n/a

Any other software?
- Simple ConPTY test code

Steps to reproduce

Example for arrow-up key (as seen during development with serial ports): Write 1: 0x1b Write 2: 0x5b 0x41

Expected behavior

Example for arrow-up key:

$ showkey -a

Press any keys - Ctrl-D will terminate this program

^[[A     27 0033 0x1b
     91 0133 0x5b
     65 0101 0x41

Actual behavior

Due to FlushAtEndOfString being true, the escape sequence is split up and rendered wrong. E.g. arrow-up becomes:

^[       27 0033 0x1b
[A       91 0133 0x5b
         65 0101 0x41

which only renders "[A" to the PTY without the desired effect.

This is 100% reproducible, i.e. the arrow keys never work because they are always received as two reads from the (virtual) serial port.

Note: A custom build of conhost.exe (OpenConsole.exe) with FlushAtEndOfString changed to return false does correctly process split escape sequence writes.

Comments

The comment in the caller in stateMachine.cpp says the following:

        // <kbd>alt+[</kbd>, <kbd>A</kbd> would be processed like `\x1b[A`,
        // which is _wrong_).
        //
        // Fortunately, for VT input, each keystroke comes in as an individual
        // write operation.

I think both statements here are wrong, or at least not universally true. Typing Alt+[ and A on a VTE based Linux terminal emulator (e.g. gnome-terminal or xfce4-terminal) will result in the same reaction as pressing the arrow-up key.

Is there any standard that guarantees VT input sequences to happen in single writes? I tested Linux support of split escape sequence writes - it worked fine on Linux 5.4 framebuffer getty TTY and on all tested PTY terminal emulators (xterm, VTE, etc.).

Related: https://github.com/microsoft/terminal/pull/2823

egmontkob commented 4 years ago

Is there any standard that guarantees VT input sequences to happen in single writes?

I don't think there's ever such a guarantee.

Traffic might go over channels which don't even have this concept (e.g. the serial line). I don't know if the standard OS methods for reading from a serial line wait for more data before returning if data is currently flowing in, or return the already available chunk immediately. This might depend on some setting (e.g. O_NONBLOCK), and obviously on the OS.

Traffic might go over layers and channels (e.g. TCP, ssh, tmux...) that are allowed to split or join fragments.

That being said, if you press a key which generates a 3 byte sequence, and your terminal emulator writes that in a single step, you can be reasonably certain that it will arrive at the reading app in a single step.

I recommend the terminal to write them in a single step. Some applications rely on heuristics (i.e. timing) to figure out what happened, e.g. to distingusing between Alt+[ followed by A, vs. the Up arrow. The most notable example is probably vim's handling of the Esc keypress. It's good to help them.

// <kbd>alt+[</kbd>, <kbd>A</kbd> would be processed like `\x1b[A`,
// which is _wrong_).

I can't see how this is relevant there, though. Doesn't this comment mix up the two directions? It's in src/terminal/parser/stateMachine.cpp, StateMachine::ProcessString(), suggesting that it's about the terminal parsing its incoming data. When you press a key, the terminal generates the outgoing sequence. The terminal doesn't need to parse keypresses. (Or does WT need to, due to its extremely complex architecture with ConPTY and whatnot?)

fabian-z commented 4 years ago

Good point on the possibility of using timing heuristics.

The terminal doesn't need to parse keypresses.

Actually, the new ConHost does need to parse keypresses from VT sequences for backwards compatibility with the old Console API. See the architecture chart from the introducing blog post.

j4james commented 4 years ago

If, as I understand it, this is about forwarding keypresses over the conpty connection, then I don't think it's worth trying to fix, because it's never going to satisfy everyone, and there are many other ways in which it can fail.

In the long term I think the solution is to use a more robust protocol for keypresses that doesn't have these problems (see #879 for discussions along these lines). My personal preference would be the DECPCTERM mode.

nicm commented 4 years ago

That being said, if you press a key which generates a 3 byte sequence, and your terminal emulator writes that in a single step, you can be reasonably certain that it will arrive at the reading app in a single step.

On localhost and fast networks you mostly get away with it but I think that is partly because the consequences of it going wrong are pretty mild - the user just presses the key again.

There is no atomicity guarantee here, applications need to handle partial reads to be correct.

fabian-z commented 4 years ago

If, as I understand it, this is about forwarding keypresses over the conpty connection, then I don't think it's worth trying to fix

I have used arrow keys over VT terminals (mostly with the mentioned terminal emulators) for years with all sorts of hardware and software, including embedded devices, without a problem. It is the de facto standard and only partially supporting it would force many applications to find workarounds.

the consequences of it going wrong are pretty mild - the user just presses the key again.

For clarification: In my case, the issue described above is 100% reproducible, i.e. the arrow keys never work because they are always received as two reads from the (virtual) serial port. I added this to my issue description

j4james commented 4 years ago

I have used arrow keys over VT terminals (mostly with the mentioned terminal emulators) for years with all sorts of hardware and software, including embedded devices, without a problem. It is the de facto standard and only partially supporting it would force many applications to find workarounds.

The backend in this case is not a VT application though. It's potentially a legacy Windows app that knows nothing about VT escape sequences, and might expect to be able to differentiate between simple keystrokes like Ctrl+2 and Ctrl+Space, which VT terminals treat as identical escape sequences. And that's not even considering more complicated scenarios, like say a piano playing app that needs to know exactly when each key is pressed and released, and be able to handle multiple keys held down at the same time.

If Windows Terminal is going to be a viable replacement for the existing Windows console, it has to able to run existing Windows console apps, and that means the conpty layer needs something more powerful than basic VT sequences for forwarding keystrokes. Any actual VT applications on the server side wouldn't need to care, though. They'll still just receive a simple escape sequence if that's all they want.

fabian-z commented 4 years ago

Testing on Windows Server 2019 (Version 10.0.17763.737) SAC / EMS with the same terminal emulator and serial port driver, the console channel type is VT-UTF8 with working support of the arrow key VT sequences.

SAC transcript

@j4james Although we might disagree regarding realistic command line use cases, I think we agree that Windows Terminal could (given proper ConHost and API support) use another standard like PCTerm better suited to transmitting keypresses and modeling the legacy Console API interactions. This issue however, is about correct support for (split) VT sequences in ConPTY and does not involve Windows Terminal as an application.

j4james commented 4 years ago

This issue however, is about correct support for (split) VT sequences in ConPTY and does not involve Windows Terminal as an application.

I don't think you can really claim there is one "correct" way to handle split VT sequences though. Every choice involves some kind of compromise. Typically it would be up to receiving application to decide how to handle things based on their particular needs. And if conpty used a more robust protocol for passing on the keystrokes then it wouldn't need to worry about any of this - the control would be back in the hands of the receiving application.

fabian-z commented 4 years ago

@j4james Excuse me if my statement was not unambiguous - English is not my first language. I was using the following definitions of "correct":

Merriam-Webster, Oxford Dictionary

conforming to an approved or conventional standard [...] behavior

The "conventional standard" here being the behavior shown (at least) by Linux VTE & other terminals, most embedded devices with serial connections and the Windows Server SAC/EMS itself.

fabian-z commented 4 years ago

Searching for relevant standards covering this topic, I found an old manual stating that

The VT100 was Digital's first terminal based on the ANSI standard for control sequence encoding (ANSI X3.64).

Although there seem to be extensions and specification of edge cases making it desirable to implement DEC compatible terminal emulation (see here), the standard that introduced (among others) CSI escape sequences is:

ANSI X3.64-1979 (Additional Controls for Use with American National Standard Code for Information Interchange) - Google Books

I think the (informational purpose) Appendix section supports the viewpoint that the structure of the input stream should not affect the operation and that subsequent characters after control sequence introductions should be processed as parameters as long as they don't result in an invalid sequence.

Excerpt from Appendix B (in our case the "device" would be ConPTY):

Appendix B - Device Concepts

[...]

This Appendix is written from the point of view of the device. Therefore the term received data stream refers to the data stream received by the device and the term transmitted data stream refers to the data stream transmitted from the device.

B2. The Received Data Stream

The received data stream is considered to be a continuous stream received by the device. The received data stream may be structured in messages, records and/or blocks, but this does not affect the operation of the device at the abstract level of description; the logical or physical units of data are regarded as being concatenated to form a continuous stream.

[...]

B5. Processing the Received Data Stream

B5.1 General

A device processes a received data stream one character after the other in the logical steps given in B5.2 through B5.4

B5.2 Control Characters

If the control character specifies a control function that can be executed by the device the corresponding operation is performed according to the definition.

[...]

When a control sequence is processed which requires one or more parameters to complete the specification of the control function, the characters representing the parameters are collected from the received data stream and processed as a part of the control operation. These characters are not separately processed by step 2 or 3 below.

The current standard should be ISO/IEC 6429:1992, but it is not available for free and I do not have a copy to check.

fabian-z commented 3 years ago

Any chance to move this issue from backlog or increase priority? It has been well over a year with no progress and ConPTY control sequence handling is still not standard conformant.

After further research, the current open standard seems to be ECMA-48:

The first edition of this Standard ECMA-48 was published in 1976. Further editions followed. The fourth edition, published in 1986 was adopted by ISO/IEC under the fast-track procedure as second edition of ISO 6429.

Relevant quote:

The data stream is considered to be a continuous stream. It may be structured in messages, records and/or blocks, but this does not affect the operation of the devices at the abstract level of description in this Standard; the logical or physical units of data are regarded as being concatenated to form one continuous data stream

Source: https://www.ecma-international.org/publications-and-standards/standards/ecma-48/ (Section 6.2 - The data stream)

ConPTY does treat input buffers as messages, breaking control sequences apart. This is invalid behaviour according to the standard.

DHowett commented 3 years ago

Thanks for following up. In the year since this issue was filed, it's gained very little support. I know that it's a standards compliance issue, and I value standards compliance, but I can't justify prioritizing it over some of the other stuff we've got going on. It's a complicated problem space where everyone has a different solution (a timeout if you receive ESC at the end of a packet, sending both ESC and the next character in case of a fault, etc.).

Unfortunately, while the letter of the standard specifies that the data stream is continuous it does not appear to specify the behavior of a conforming application in the face of ambiguous stream contents.

nicm commented 3 years ago

In practice the standard can specify all it likes, there is no way to ensure escape sequences are not broken up somewhere between terminal and application. Not to mention if the user for example presses Escape then q instead of M-q. Most applications that I am aware of solve this by using a timeout if the key is ambiguous.

It is a little unfortunate that you do this so aggressively because it will break naive applications like showkey and any application where the user disables the timeout, but it is something that applications have to handle.

fabian-z commented 3 years ago

@DHowett Thanks for your reply and clarification. I understand this is a complex problem space and priorities are difficult to manage. However, in my opinion this is not only a theoretical standards compliance issue but also a bug that actually prevents usage of ConPTY for applications that cannot guarantee that escape sequences are terminated within one input buffer, like network or serial port applications. If not changed, this would require reimplementation of another layer of escape sequence handling or bundling of a patched OpenConsole.

It also seems to be possible to work around this with solutions approved by Microsoft engineering, since there is a working implementation for Windows Server SAC, which I tested in the same environment (see https://github.com/microsoft/terminal/issues/4037#issuecomment-570251085).

The standard defines that action in error recovery is left to the implementation. However, if a control sequence is started in one input and not finished (but also not invalid), I think the correct course of action is to wait for the next input since otherwise ConPTY does not work on an input stream but on individual input buffers (violating the standard). A timeout could also be used as @nicm suggested (treat the input sequence as partial or error according to the spec after some amount of time without new input).

The reasoning for the current implementation seems flawed to me (see code comments or https://github.com/microsoft/terminal/pull/2823#issuecomment-534603594), since it says e.g.

alt+[, A would be processed like \x1b[A, which is wrong

But testing with different Linux terminal emulators, it is exactly the observed behaviour: echo 1b5b41 | xxd -r -p - and echo 1b | xxd -r -p - && echo 5b41 | xxd -r -p - both cause the same reaction without printed output.

Fortunately, for VT input, each keystroke comes in as an individual write operation.

This cannot be guaranteed for many applications and the assumption effectively violates the standard (see above).

determin1st commented 6 months ago

VTs are VLs (very lame). that's not only the problem of continuous transmission but the problem of keycode/modifier/state transfer.

the author of X term many times referred by microsoft in the docs, with his invisible island (which is clearly visible to me) based it on X system that passes keycodes no problemo, but he choose not pass them but do something based on the dead/zombie tech (though ESC sequences are cool)

the solution is the different protocol that will simply deliver keycodes and translated symbols like ReadConsoleInput of the thoroughly designed oldish conhost does

CSI > version u - please enter reliable key protocol of this version CSI < ! u - okay, im in ... CSI < keydown ; keycode ; modifier ; length u (utf8 character of length bytes follows) ... CSI > ! u - return to zombie tech (no response, did it) ... CSI > ? u - what's your version of key protocol? CSI < 123 u - i can do 123 (probably bitmask)

determin1st commented 6 months ago

8-bit response mode solves that problem, any 8-bit control is a native continuator, so the parser could delay the parsing, 7-bit control wont go anywhere 's futile, because 1B has a meaning by itself. also the protocol, the world needs a nice key protocol

zadjii-msft commented 6 months ago

@determin1st You're probably more interested in #4999, where we literally made a protocol for transmitting Win32 console input sequences (nearly 4 years ago now)

determin1st commented 6 months ago

@zadjii-msft oke, ill look through.. for now can say that INPUT_RECORD is probably overkill, KEY_RECORD yes, mouse tracking protocol exists already, would be more compatibility..

determin1st commented 6 months ago

ooke, as im progressing in construction of my own parser, i see that 8-bit responses dont fully solve the partial input problem (some edge cases). there must be a partial input terminator.

Putty supports ENQ request-response, but it is not wildly supported. i think another basic request - the DA1 is appropriate, because its result/answer is already known to the terminal user (any library does it upon initiation)

so after the parser returns the partial input state, stores the remaining bytes for the next attempt, it also should do the DA1 request to have a clarity whether no more bytes would follow the partial. this nicely closes all edge cases.

waiting for known input is better than waiting for unknown input.

also i was browsing through some of docs here and found this https://github.com/microsoft/terminal/blob/main/doc/specs/%23976%20-%20VT52%20escape%20sequences.md 's like with INPUT_RECORD miswording, ANSI/VT52 is a single mode it's not leaving VT52 to ANSI or entering ANSI leaving VT52 vice versa, its the same thing. those ancient docs didnt clearly say what mode activates, "previous" kind, probably DEC thing or ASCII thing. dont understand why any modern terminal user need that. should be always off