stateful / runme

DevOps Notebooks Built with Markdown
https://runme.dev
Apache License 2.0
1.2k stars 38 forks source link

vscode for web and serverless RunMe #616

Open jlewi opened 5 months ago

jlewi commented 5 months ago

Feature Request:

Make RunMe vscode extension compatible with vscode for web.

This is different from the existing RunOnMe on Web experience because the current experience relies on codeserver which means

Running vscode on the web means the RunMe vscode extension would be running in the browser and accessing notebooks on the machine where the browser is running and accessing a remote RunMe GoLang server. See the diagram below.

This has come up in a couple discord threads https://discord.com/channels/1102639988832735374/1102639989700968583/1250577804618371132 https://discord.com/channels/1102639988832735374/1102639989700968583/1232903457912913991

Motivation

Here's a diagram illustrating how I would like to deploy RunMe.

shapes at 24-05-23 07 59 47

Notably here

This solves a couple key pain points related to runbooks and infrastructure

VPC perimeters and bastion nodes

Enterprise infrastructure usually lives within a secure VPC. This means to run the steps in a playbook (e.g. kubectl, gcloud, awscli commands etc...), you usually have to tunnel into a machine within the perimeter.

With the proposed architecture users could open up RunMe notebooks in their browser and then execute those commands inside a machine inside the VPC via RunMe's GoLang server. The notebooks would be stored locally.

Reproducible / Containerized Environments

A major headache today with operations is that each developer has to install and configure all the tools used in playbooks. The above architecture means this can be replaced with containerized environments provisioned with role accounts. All that needs to be installed on the client is a browser and network access.

Remote Debugging

When troubleshooting problems with VMs or Containers its often necessary to execute commands within those containers or environments. The above architecture would mean we could start the RunMe server inside the target (E.g. as a K8s ephemeral container) and then execute parts of our playbooks inside that machine.

Importance of storing notebooks locally

A critical difference with today's RunMe on web is that in the proposed architecture notebooks are stored on the machine where the browser runs. This simplifies deployment of the server because the RunMe GoLang server can effectively be treated as stateless whereas a codeserver is stateful. In particular, since the RunMe notebooks are stored in the codeserver server the codeserver server can't be recycled until the notebooks have been persisted to some durable storage (e.g. git).

Known Blockers

There are two main blockers I'm aware of to making RunMe compatible with vscode for web

  1. gRPC
  2. Moving serialization into the browser

RunMe's vscode extension uses gRPC to communicate with the GoLang server. gRPC can't run inside the web browser. There are 3 possible options

  1. Use buf's connect protocol
  2. Use grpc-web
  3. Use grpc-gateway

I think the connect protocol is the most promising. The other two require running a proxy. Using the connect protocol also means you can continue to use buf's generated clients.

Currently, RunMe serializes/deserializes notebooks inside the GoLang server. I believe there was an experiment to run serialization in the browser using WASM. However, it looks like the WASM code path is being removed (stateful/vscode-runme#349).

For the above architecture to work well, you want serialization to run in the browser client side so opening/saving notebooks isn't blocked on provisioning a RunMe server.

sourishkrout commented 5 months ago

Currently, RunMe serializes/deserializes notebooks inside the GoLang server. I believe there was an experiment to run serialization in the browser using WASM. However, it looks like the WASM code path is being removed (stateful/vscode-runme#349).

WASM pre-dated the gRPC (de-)serializer. While it worked great in the VS Code desktop app we ran into runaway memory issues in remote execution environments. This was likely a side-effect from cross-compiling the Runme CLI to WASM instead of breaking out the Serializer into a discrete library. In any case, WASM was abandoned because the Runme kernel's role expanded and needed to live much closer to the system (io, tty, etc).

Known Blockers

There are two main blockers I'm aware of to making RunMe compatible with vscode for web

  1. gRPC
  2. Moving serialization into the browser

While not impossible from a pure technological point of view, the VS Code platform has some firm boundaries here. Many of the APIs (e.g. terminal, code lenses, file system operations, etc) as well as "work out of the box" like "product experience" are provided by VS Code's Extension Host APIs. These are vastly limited for what VS Code calls "Web Extensions" which basically limits them to either read-only rendering or something purely WASM-based: https://code.visualstudio.com/api/extension-guides/web-extensions & https://github.com/microsoft/vscode-extension-samples (search for directories containing wasm).

While one might think VS Code is "just a webapp", it's really a IDE micro-services architecture and operating outside of architectural boundaries comes with VS Code Platform incompatibility as well as delivery/packing tradeoffs.

sourishkrout commented 5 months ago

Here's a diagram illustrating how I would like to deploy RunMe.

One light-lift option for a serverless deployment could be VS Code's tunneling capabilities. And, code-server (by Coder) or the Theia project might have some more flexible answers which I haven't fully explored.

E.g. The GHA we've built that let's you drop into a tunneled VS Code like a "breakpoint" in workflow: https://github.com/stateful/vscode-server-action/blob/main/src/main.ts#L45-L51

I believe MSFT's licensing restricts the tunnels for non-commercial use but otherwise should allow for open/public delivery.

Not saying we don't want "serverless" ourselves, however, just trying to offer alternatives that are available "now".

sourishkrout commented 5 months ago

VPC perimeters and bastion nodes

Enterprise infrastructure usually lives within a secure VPC. This means to run the steps in a playbook (e.g. kubectl, gcloud, awscli commands etc...), you usually have to tunnel into a machine within the perimeter.

With the proposed architecture users could open up RunMe notebooks in their browser and then execute those commands inside a machine inside the VPC via RunMe's GoLang server. The notebooks would be stored locally.

Reproducible / Containerized Environments

A major headache today with operations is that each developer has to install and configure all the tools used in playbooks. The above architecture means this can be replaced with containerized environments provisioned with role accounts. All that needs to be installed on the client is a browser and network access.

Remote Debugging

When troubleshooting problems with VMs or Containers its often necessary to execute commands within those containers or environments. The above architecture would mean we could start the RunMe server inside the target (E.g. as a K8s ephemeral container) and then execute parts of our playbooks inside that machine.

Runme by extension VS Code's Remote Development capabilities supports these scenarios in multiple ways. They are well maintained by MSFT and Docker and have whether we "believe" it or not come with credibility of Microsoft' & Docker's brands.

  1. Attach VS Code to Bastion via SSH: https://docs.runme.dev/how-runme-works/runme-via-ssh#how-to-set-up-ssh-connection-in-vs-code No change in the Runme notebook UX changes, except that now the host system is your jumphost.

  2. First-class Devcontainer Support: https://docs.runme.dev/guide/devcontainer I call this "opscontainer" but the idea is the same as SSH. Instead of the bastion host, you run against a locally hosted container.

While we are open to improve the engineer's experience we're "trying" really hard to build on existing open standards and leverage all the benefits that come with it.

Btw, here's an example repo from a recent talk I've given at Rejekts: https://github.com/stateful/rejekts-eu-2024

jlewi commented 5 months ago

Thanks that's useful context.

So I think the partial work around today would be to use the vscode option in RunMe to specify the gRPC address of the RunME server and use a remote server. I would still use vscode locally. I plan on experimenting with this soon.

This is different from VSCode in ssh because the filesystem where notebooks are stored and the server where commands are running are colocated.

sourishkrout commented 5 months ago

So I think the partial work around today would be to use the vscode option in RunMe to specify the gRPC address of the RunME server and use a remote server. I would still use vscode locally. I plan on experimenting with this soon.

This is different from VSCode in ssh because the filesystem where notebooks are stored and the server where commands are running are colocated.

Please let me know how that goes. Another experiment you could give a try is to run Runme's kernel server through an SSH tunnel which is likely not so different than using a remote socket.

jlewi commented 2 months ago

@sourishkrout I've been thinking more about this. In particular, I've been wondering how much work it would be to create a minimal version to begin testing demand and utility.

Is it possible to start listing the parts of RunMe that would need to be refactored in order to make RunMe work in vscode for web?

Are there other pieces of RunMe functionality that won't work in vscode for web?

What is VSCode Terminals & xterm For?

Is this to support running cells interactively? It looks like IRunnerProgramSession implements the PseudoTerminal Interface. It looks like GrpcRunnerProgramSession implements the PseudoTerminal interface.

Would disabling the ability to run interactively in web be an easy way to deal with that?

That said it seems like this should work in VSCodeForWeb. It looks like GrpcRunnerProgramSession is mapping the terminal interface onto GRPC requests and the actual execution of the commands happens inside the RunMe gRPC server which isn't constrained by the browser. So if we switch the transport to a protocol that works in the browser then it should work?

Per this issue it seems like pseudoterminal is supported in web. https://github.com/microsoft/vscode/issues/116022

Conditional Compilation For Web vs. Node

Is there a good pattern for excluding code that shouldn't be included in one of the versions?

sourishkrout commented 2 months ago

The general reference architecture is documented here: https://code.visualstudio.com/api/extension-guides/web-extensions.

It's tough to gauge and quantify how much entanglement there is between web/node APIs. Making code work, behave, and cleanly bundle for both web/node (runme's code plus dep tree) is likely what's under the tip of the iceberg. Javascript is not Javascript. I don't just worry about the work required to detangle but to not destabilize the rock-solid parts in Runme.

If I were to tackle this, I'd likely attempt it in stages:

  1. Make a render-only Notebook UX available on the Web like Jupyter.
  2. Consider what it'd take to bring running capabilities to the web.

To be clear, though, 1. ranks low on the roadmap and 2. even lower. The reason is that the Runme users I talk to expect a complete environment and don't see/use the Notebook as a standalone user experience. In my mind, going against the grain of the "ideal user profile" with this approach is Pandora's box. It really just makes it more difficult to quickly get up and running due to a config-intense out-of-the-box experience.

I do, however, think using Runme server-less has legs. However, requiring compute/mem is inevitable whether it's as part of an IDE-based delivery model or a frontend for a remote host. The former won't need changing a single line of code, when delivered server-lessly. I believe gauging interest and proving it can be done with a packaged container image (with code-server) that users can run locally or as part of a managed "server-less" dispatcher.

I reviewed the requirements outlined initially, and the "locality" of storing notebooks locally seems important. No? I wonder if the solution here is using more of the VS Code extension API. Virtual document/file system/workspaces come to mind. Perhaps is "easy" to build a pass-thru with the browser APIs?

https://github.com/microsoft/vscode-extension-samples/tree/main/fsprovider-sample https://github.com/microsoft/vscode-extension-samples/tree/main/fsconsumer-sample

Sorry about the extended response, but, as you can tell, I have a "frontend-backend" knee-jerk from numerous past conversations (with various devs) where I feel an entrenched understanding of architecture is driving a design/approach, not the requirements and/or the user persona. :-)

Comments on your arch/impl notes:

VS Code Integrated Terminals

It is not an issue since https://vscode.dev already proves that they run "as a web app". The challenge is to find a web extension that allows running a shell (vscode.dev is entirely host-less). I believe there's an experimental WASM-compiled Python interpreter that runs fully self-contained. Another way to prove the concept is to create a tunnel and just use vscode.dev as IDE frontend.

Notebook Terminals

It isn't an issue since it's running a web component per cell, and the PTY/TTY is more or less a character device abstraction agnostic from a "host system". The GRPC client abstractions are somewhat well-defined but likely not narrow enough to replace them with a Connect alternative. However, the risk here is lower because they are already loosely coupled. Again, I'm struggling to see the merit in porting the transport before having a solid understanding how to deal with the host of issue of the out-of-the-box experience.

jlewi commented 2 months ago

Thank you for the detailed response.

I'd like to separate the questions of

  1. How much work would it be ?
  2. Is it worth it?

For how much work would it be, it sounds like there is a lot of unknowns. In particular,

  1. Its unclear what are the portions of the code/deps that would need to be refactored to work on web
  2. Its unclear what effort would be required to maintain separate solutions for web and node

Do you have suggestions for what a time bounded way of getting more clarity on 1 would be?

If I were to tackle this, I'd likely attempt it in stages:

  1. Make a render-only Notebook UX available on the Web like Jupyter.
  2. Consider what it'd take to bring running capabilities to the web.

I think this is a great suggestion.

VS Code Integrated Terminals

The challenge is to find a web extension that allows running a shell (vscode.dev is entirely host-less).

Why would you need to run a shell in the browser? My assumption is the browser is still communicating with the fully functional RunMe server which is running outside the browser.

Re: Is it Worth It

Re: Ideal User Profile

The reason is that the Runme users I talk to expect a complete environment and don't see/use the Notebook as a standalone user experience.

Thats interesting. What about developers that aren't using VSCode?

Do the users you talk to already have access to a cloud development environment? e.g. code-server, github workspaces, etc...?

Reference Architecture For Platform Teams

So a question I keep coming back to is, if I was a platform team how would I create a paved path for reading/writing/executing runbooks?

I can think of three options

  1. Every user installs VSCode/RunMe on their local machine
  2. Setup cloud desktop environments(CDE) - e.g. use coder to give everyone a code-server
  3. vscode for web

I see drawbacks to each of them.

Every user installs vscode/runme

Setup CDE

vscode for web

So where are we headed?

So given the above none of the options are great, IMO.

I really like your suggestion about creating a render-only Notebook UX available on the Web because if we have some minimal web based experience then its possible to incrementally improve it.

hotpocket commented 5 days ago

I hope to be submitting a pr in a few weeks - i am working on creating a serializer that will use the connect protocol that would unblock this from being able to run in vs code for web.