metawrap-dev commented 4 years ago

Description of the new feature/enhancement

Continuing from https://github.com/microsoft/terminal/issues/5746#issuecomment-671601536

It would be fantastic if I could just pipe out HTML5 from my console app and have it display on the terminal.

Proposed technical implementation details (optional)

I'm imagining some kind of escape sequence that enables/disables HTML chunk output or a file/pipe that could be written to/read from. The escape sequence would determine the dimensions/container types. This sequence could be output by any app even running remote but for I am mainly considering locally running command-line apps. An app could interrogate the environment vars to determine if the HTML rendering option was available.

HTML output could be rendered within the following container types.

An inline WebView that is scrolled along and put to sleep off-screen? Gets turned into an image when the parent app closes? I could imaging starting an app that outputs some HTML/JS that opens a WebSocket back to the parent app with all sorts of possibilities. Live graphs, buttons etc.
A popup window?
An underlay that takes up the entire terminal window. such that content is rendered under the terminal text.
An overlay that takes up the entire terminal window. such that content is rendered over the terminal text?

These last two could be used for backgrounds, Persistent graphs, VStatus widgets, Controls. etc.

It would change the terminal into a graphical environment based on HTML/CSS/SVG/JS. I suspect it would have rapid and enthusiastic adoption by developers. Could be awful. Could be amazing.

zadjii-msft commented 4 years ago

A lot of this is going to sound like rambling - I'm brainstorming to see how this would work in practice, so this is all very train-of-thought.

Let's not mention the fact that WebViews aren't supported in XAML islands applications currently, because that really makes the whole thought experiment impossible. Let's presume that the year is 2030 and they are supported and we could actually do that.

Alright so let's imagine for a second that this is possible. We've invented a hypothetical escape sequence that says "make this RxC (in characters) region of the buffer a Webview with this HTML content".

Would this work over a remote connection to another terminal emulator running on another system (i.e. ssh)? Probably, yea. HTML is just text after all. It would be the terminal's responsibility to load any images, or other web content, but presumably, the webview embedded by the terminal emulator would handle that for them.

How would this work for something like tmux or screen, where their text buffer might be a subset of the full terminal buffer? I'm not sure it'd be possible at all. Sure they could translate the message from "make a webview at y,x,h,w=0,0,R,C" to some other coordinate space, but scrolling that webview in a tmux pane would certainly be challenging. How would they be able to scroll half the web view out of the buffer?

          +--------------------------------+
          |                                |
          | zadjii@ubuntu:~$               |
          |                                |
Pane 1 >  |       +--------------+         |
          |         (obscured              |
          |       |    web       |         |
          |         content?)              |
          | +-----+--------------+-------+ |
          |       | Web View     |         |
          |       | content      |         |
Pane 2 >  |       |              |         |
          |       +--------------+         |
          | zadjii@ubuntu:~$               |
          |                                |
          |                                |
          +--------------------------------+

Or what about when the top of the WebView scrolls off the top of the buffer, but the bottom hasn't yet? (not the viewport, but the entire buffer itself. There's only 9000 some lines in the scrollback total, so it's trivially possible for the web content to get scrolled up and out of the buffer). Does the webview shrink in height until it has a height of 0 rows? or do we continue to crop the visible region, leaving some of the content to be un-interactable.

What happens if a commandline app prints a webview, then prints some text over that same region where the webview is? Is the web view permanently on top of any text in that region? Or would we have to draw the terminal contents on top of the webview? I know that D2D can't anti-alias text that's on a transparent background, but something like acrylic wouldn't work for text that's drawn on top of a webview. It would be tricky for us to be able to handle that case - we'd have to know "this cell wants to have a transparent BG, but it's on top of a webview, so actually draw it's BG as opaque", and the renderer would need a lot of work to support that.

There's no way for the terminal to communicate events or info back to the commandline client. Or ew, we could develop a whole set of input sequences to allow the webview to "send ipnut" back to the client app.

Maybe we'd need something like the ID's that OSC 8 uses, so that the client app can uniquely identify each webview it makes, and refer to them later. Maybe to support tmux, a client app would need to say "move webview[1] to x,y" or "resize webview[1] to r,c", or "clip webview[1] (in such a way that makes sense)". These in combo might be able to fix the tmux issues.

All that seems less horrifying then my real concern with this request. If we allow the client to emit HTML with javascript in it, then we'd be allowing commandline clients to run arbitrary code in the context of the Terminal. Just on paper, that sounds like a horrible idea. What if we follow #5000 to conclusion, and we have the terminal "content" running in another process from the terminal "window"? Now this becomes nearly impossible again. We'd need some way for the content process to say "oh, by the way, there's a webview at x,y,w,h, with (some HTML) please", and the window process would need to be able to draw that webview. Combine that with Mixed Elevation, where the content process is running at medium-IL and the window is running elevated. Presto, you've got yourself a trivial escalation of privilege.

Is there a way on windows for the client app to create the "web view" and own it, and then hand a HANDLE to the webview to the terminal to say "pretty please, draw this webview for me", leaving the webview at the client's privilege level? Maybe - but that'll certainly break in any remote scenarios, so that's out the door.

Would this work at all on linux? I haven't the faintest clue. I don't know if there's a "webview" that VTE could re-use easily. Linux GUIs are obviously not my area of expertise, but I'd refuse to sign off on something that wasn't possible on other platforms as well.

I'm thinking that overall, this is something that might be possible, but would require an enormous, cross-platform engineering effort. I'm going to stick this is the freezer for now - it's an interesting thought experiment, and if someone figures all this out, then sure let's do it. But I'm thinking there are too many open issues to make this feasible on this side of 2025.

metawrap-dev commented 3 years ago

Just saw this, and thanks for the fantastically detailed reply.

For the security implications, my assumption would be that the browser should be given no more privilege than a Unicode display character. Bare with me on this strange phrase :) What I mean is, someone clever doing a breakout using some kind of browser content RCE would get that same privileges they would have gotten with rendering a malformed Unicode character if that was an open vector. Also worth mentioning that if you have a rogue program running in your terminal outputting belligerent HTML/JS, you might have other problems but I can imagine a scenario with someone cat-ing a log and accidentally opening a million vicious browser instances. :) And, if my example above is possible then the internet has an even bigger problem. All said I understand that a browser is more than this and it would expand the attack surface of WT. The benefit would be WT usable as a kind of interactive technical the-other-Pythonesque "workbook", which I suspect could be a very good thing.

For the overwriting/viewport, I would assume the same rules apply as terminal characters overwriting themselves, but if you click in the web view and wake it up, it would render over the top. if your output is messy the output is messy,

The HANDLE aspect might be impossible when remote, but the biggest win here would be just HTML/JavaScript. Even that would enable inline interactive reports/graphs. I've often wondered if it could be possible in the terminal to detect the context of 'stdout' vs 'stderr' from the app (then maybe display normal vs highlighted in the terminal) then we could add things like 'stddraw' as a channel?

Thank you for indulging my overactive imagination. :)

microsoft / terminal

Allow display of HTML output by a console application to enable some nice graphical/interactive features. #7246

Description of the new feature/enhancement

Proposed technical implementation details (optional)