Isolate (sandbox) language servers

jansol commented 1 month ago

Check for existing issues

[X] Completed

Describe the feature

Related to #12354. Language servers downloaded by Zed have full access to everything on the machine. This is problematic from a security perspective (e.g. as seen recently the github release could be tampered with, causing people to download and execute malicious code). To mitigate the possible impact from this, it would be useful to sandbox language servers somehow, e.g. restrict them to only access files that belong to the work tree (and the language's respective system/standard library modules).

If applicable, add mockups / screenshots to help present your vision of the feature

No response

maxdeviant commented 1 month ago

Is there any prior art in this space?

jansol commented 1 month ago

The extreme solution would basically be containers (docker, lxc, etc). I don't know how they work on other platforms but on Linux it AFAIK boils down to some cgroup2 trickery and chroot'ing into a virtual directory hierarchy set up with bind-mounts that looks like a linux system with only the specified things present.

maxdeviant commented 1 month ago

I was thinking more of other editors that have sandboxed execution of language servers.

bjorn3 commented 1 month ago

Implementing good sandboxes from scratch can be surprisingly tricky. For example if you forget to create a new network namespace (for example because the language server needs internet access to download dependencies from the internet and you don't want to use a bridge between the host and container network namespace like netavark), you just got yourself a way to escape the sandbox by sending arbitrary keypresses to the X server over a unix socket in the "abstract namespace" which is attached to the network namespace rather than the mount namespace like regular unix sockets.

Also the exact extent to which you can sandbox a language server depends on the language server in question. For example rust-analyzer needs to be able to run builds with cargo, which in turn requires read-only access to the system toolchain and all source code as well as full write access to the target dir, Cargo.lock and the caches in ~/.cargo/{git,registry}. While other language servers may be perfectly fine with not being able to access any files at all other than through the editor presenting it the source code for open files through the lsp protocol.

skewballfox commented 1 month ago

For example if you forget to create a new network namespace (for example because the language server needs internet access to download dependencies from the internet and you don't want to use a bridge between the host and container network namespace like netavark)

I'd argue that that isn't a feature that should be built directly into a language server, and instead handled by the whatever is (planned for) managing the extension.

For example rust-analyzer needs to be able to run builds with cargo, which in turn requires read-only access to the system toolchain and all source code as well as full write access to the target dir, Cargo.lock and the caches in ~/.cargo/{git,registry}.

some language servers also check ~/.config for global configurations, along with project scoped configurations in the workspace directory.

I checked the docs for extensions but didn't see anything relevant. If extension sandboxing is something zed is interest in supporting, one way to support this would be to have some kind of allowlist in the extension toml:

[permission.files]
workspace=["pyproject.toml", "mypy.init"]

SyntheticBird45 commented 1 month ago

Regarding linux:

Namespaces are extreme and might cause issues on hardened kernel/distributions as user namespaces are disabled. Moreover, it is tricky to provide good security with them by respecting the least privilege principle, as it'll be easy for things to break. For reference, VSCode (being based on electron) make use of a SUID helper binary to setup a sandbox when user namespaces are unavailable.
Stricly speaking of FS restrictions, the Landlock API can be useful
SELinux and Apparmor are out of scopes as you can't dynamically manage them/load your own without risky permissions and/or performance cost.
Flatpak make use of Bubblewrap software to sandbox its applications. Bubblewrap is available as a user namespaces and SUID variants.

From my POV, Landlock and Bubblewrap seems to be the best available options at sandboxing launched extensions processes

bjorn3 commented 1 month ago

Stricly speaking of FS restrictions, the Landlock API can be useful

Landlock currently doesn't support restricting unix sockets in the abstract namespace, so without separately creating a new network namespace using eg Bubblewrap or restricting all network access sycalls using seccomp, it isn't really useful for sandboxing on desktop systems.

jansol commented 1 month ago

Looks like there's a "rust-native" implementation based on seccomp+namespaces+cgroup resource limits with a fairly nice API: https://crates.io/crates/hakoniwa

Bubblewrap would be the more "mainstream" option. AFAIK it is also used by steam, among others, but I did not find existing rust bindings for it with a quick search.

SyntheticBird45 commented 1 month ago

Landlock currently doesn't support restricting unix sockets in the abstract namespace, so without separately creating a new network namespace using eg Bubblewrap or restricting all network access sycalls using seccomp, it isn't really useful for sandboxing on desktop systems.

Looks like there's a "rust-native" implementation based on seccomp+namespaces+cgroup resource limits with a fairly nice API: https://crates.io/crates/hakoniwa

As long as the final solution do not make Zed rely on user namespaces, it's fine.

jansol commented 1 month ago

is this what you mean by user namespaces? Looks like they are optional.

bjorn3 commented 1 month ago

Without user namespaces or a setuid helper it is not possible to create new mount or network namespaces without root permission. This means that the only sandboxing you can do is with seccomp. This is rather fragile for two reasons however: You can't selectively allow specific paths to be read or written. Only allow or deny all filesystem access. This is fine when the program to be sandboxed applies the seccomp rules after dynamic linking and other kinds of initialization that need filesystem access and the program genuinely doesn't need any filesystem access at runtime or all files it needs access to are opened by the program before applying the seccomp rules. This doesn't work when you want to sandbox an untrusted application rather than a trusted program that may happen to have a bug reachable only after the initialization phase. The second reason is that seccomp works at the level of individual syscalls. You either have to allow all syscalls except for specific ones (which will allow a sandbox escape when updating the kernel) or deny all syscalls except for specific ones. When denying all except specified ones any change to the to be sandboxed program may cause the sandboxed process to crash. Any libc update can start using new syscalls and the same applies for other libraries and the program itself.

SyntheticBird45 commented 1 month ago

Since VSCode is based on electron it can take advantage of the SUID helper chrome-sandbox. I hardly see Zed needing to ship a root/4755 binary. Unless the sandboxing is a user namespace only feature, there is no practical way for Zed to sandbox language servers on Linux

bjorn3 commented 1 month ago

Relying on bwrap would still be an option. It can take advantage of user namespaces, but if those are not available it can run as SUID helper. A lot of people have it installed already as it is required by flatpak. And for the remaining users you can ship a copy and try to use user namespaces. For the small subset of users who have neither bwrap not user namespaces opting for no sandbox would be better than opting for no sandbox for anyone.

zed-industries / zed