Open belisarius222 opened 1 year ago
Some thoughts/things to keep in mind:
(...) hitting the return key should fire off the request immediately and clear the timer. This exception is carved out because the return key completes a command, which should be run on the backend as soon as possible. Perhaps we'd want the tab key to do the same thing, since that triggers an autocomplete request. (...) When the user types a keystroke, its result should show up on the screen immediately, without waiting for the request to the server to be sent and acknowledged.
In the presence of non-default sessions (specifically, sessions not connecting directly to drum), these behaviors are not guaranteed. In fact, some of these aren't even guaranteed for drum in certain edge cases, like when typing a prompt input wider than the screen.
I would be wary of coupling the webterm tightly to drum's behavior. In some sense you could call that a regression. If we want to lean hard in that direction, we might consider a separate app specifically for connecting to agents that support the drum protocol (similar to what old webterm did).
Got curious about what mosh does for this (it makes a very similar "ssh but with optimistic rendering" play), and it gets a little fancy:
The other major benefit of working at the terminal-emulation layer is that the Mosh client is free to scribble on the local screen without lasting consequence. We use this to implement intelligent local echo. The client runs a predictive model in the background of the server's behavior, hypothesizing that each keystroke will be echoed at the cursor location and that the backspace and left- and right-arrow keys will have their traditional effect. But only when a prediction is confirmed by the server are these effects actually shown to the user. (In addition, by default predictions are only displayed on high-delay connections or during a network “glitch.”) Predictions are done in epochs: when the user does something that might alter the echo behavior — like hit ESC or carriage return or an up- or down-arrow — Mosh goes back into making background predictions until a prediction from the new batch can be confirmed as correct.
That sounds perhaps too fancy to be in scope, but it would be a better option than just assuming drum semantics everywhere.
If we absolutely must implement the above suggestions in the short term, I'd recommend doing them just for the default session, since we currently know for sure that drum is running there. We could also add a scry endpoint to herm/dill, so that we can just check what the agent handling that session is.
The webterm frontend should do some tricks to improve the perceived responsiveness of the system.
When a user hits a key for the first time in a while, the frontend should immediately send that to the backend as it does now, but it should also set a debounce timer for maybe 300ms. All keystrokes during that time should be enqueued on the frontend and only sent to the backend once the timer elapses, with one exception: hitting the return key should fire off the request immediately and clear the timer. This exception is carved out because the return key completes a command, which should be run on the backend as soon as possible. Perhaps we'd want the tab key to do the same thing, since that triggers an autocomplete request.
Optimistic display should also be added, if it doesn't already work that way. When the user types a keystroke, its result should show up on the screen immediately, without waiting for the request to the server to be sent and acknowledged.
The combination of these two features should keep the perceived latency of the system very low, even if communication with the backend is laggy.