ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.93k stars 2.55k forks source link

Run build.zig logic in a WebAssembly sandbox #14286

Open andrewrk opened 1 year ago

andrewrk commented 1 year ago

Extracted from #14265.

build.zig is handy for adding build logic, but it's not handy to trust a lot of different people's code running natively on the host. A deep dependency tree has the problem that many different build.zig scripts are all running, and the chance of insufficient auditing grows quickly the more dependencies there are.

This issue is to make the zig build system compile build.zig to wasm32-wasi instead of natively, and merely output its build graph, based on the user-provided build options, to stdout. At this point a separate build_runner can execute the requested build steps (or not, depending on the permissions granted).

Perhaps a middle ground, here, would be to run the make() steps directly in the WASI code. But eventually the idea would be that make() steps happen by a separate build runner, not by the wasm guest.

I can foresee a potential escape hatch for opting into running build.zig directly on the host. As an example, the Android SDK package wants to access the Windows registry in some cases. However, I would like to avoid even having this escape hatch if possible.

This issue will require Zig to gain a WASI interpreter, written in Zig. Whereever will we find one?

ikskuh commented 1 year ago

I really like the idea of running the build in a sandboxed environment, but right now i have some use cases that would not be possible.

The first one is that i have a custom build step that compiles resources into a desired target format:

pub const CompileDrawingStep = struct {
     const Self = @This(); 

     sdk: *const Sdk,
     step: Step,
     input_file: FileSource,
     output_file: GeneratedFile,

     pub fn getOutputFile(self: *Self) FileSource {
         return FileSource{ .generated = &self.output_file };
     }

     fn make(step: *Step) anyerror!void {
         const self = @fieldParentPtr(Self, "step", step);

         const src_path = self.input_file.getPath(self.sdk.builder);

         // Just run the input file through the TVG parser once
         {
             const file = std.fs.cwd().openFile(src_path, .{}) catch |err| {
                 std.debug.panic("Failed to open {s}: {s}", .{
                     src_path,
                     @errorName(err),
                 });
             };
             defer file.close();

             validate(self.sdk.builder.allocator, file.reader()) catch |err| {
                 std.debug.panic("{s} is not a valid TinyVG file: {s}", .{
                     src_path,
                     @errorName(err),
                 });
             };
         }

         // No need to cache it, we just verbatim pass the file through 
         self.output_file.path = src_path;
     }

     fn validate(allocator: std.mem.Allocator, reader: std.fs.File.Reader) !void {
         var parser = try tvg.parse(allocator, reader);
         defer parser.deinit();

         while (try parser.next()) |cmd| {
             _ = cmd;
         }
     }
}; 

I have several of similar build steps (one invokes a format compiler, one converts images via zigimg, ...) that will finally be bundled into a zig package:

https://github.com/Dunstwolke/core/blob/047d36ce01be3fd7ac160c2360f0139b9c0a5856/Sdk.zig#L393-L601

This stuff will be way harder with a purely declarative/sandboxed build runner.

Another example is the Zig Text Template: https://github.com/MasterQ32/ZTT/blob/master/src/TemplateStep.zig#L43-L278

Imho the ability to do such things without having to put all of that logic into separate executables (in case of Dunstwolke compiler that isn't really possible anyways) is quite a significant feature of the Zig build system.

If we can get such things working (preopen the project directory to read/write file system access might already be enough), it would be awesome to have these things sandboxed

kuon commented 1 year ago

I do not know how difficult would be a zig WASM runner, but I can share my experience about writing sandboxed application.

I'll try to make the "story" short, while preserving context.

When covid vaccine arrived, authorities had to triage patient to vaccinate them in the "best" order possible. Criteria where: age, gender, medical history, location and so on.

They built a system with a "naive" approach and it quickly started to break. The problem was simple: rules where changing daily (especially people age, as they could have a birthday any day) and the database did grow to million of entries. Long story short, I was hired to find a solution.

What I came up with was to build a visual editor, a bit like excel formula editor to create rules, then those rules when compiled to a single elixir function.

And here the sandboxing problem did arise. What I did is use the editor to generate elixir AST, then I traverse this AST and whitelist functions and arguments, once done, this AST is compiled to regular elixir bytecode and executed like normal code. This did yield very good performances.

We could do something similar for build.zig and build.zon/package.zig...

  1. The normal zig parser transform the source file into AST.
  2. The AST is passed through a whitelisting function. This function traverse the AST and check every statement with a whitelist of function and arguments (for example, we can easily emulate a chroot with file function)
  3. The code is compiled and run normally like it is now.

I've been maintaining our elixir engine and it works very well. Of course, if you allow flow control statement you are not protected against things like while(true) {} but those "attack" should be very easy to detect and stop.

deflock commented 1 year ago

Some projects have a concept of Trust, e.g. Visual Studio Code, direnv: on first use of something you are prompted if you trust it to run. Does it make sense to implement something like this first if I don't need sandboxing or will it annoy users?

kuon commented 1 year ago

Some projects have a concept of Trust, e.g. Visual Studio Code, direnv: on first use of something you are prompted if you trust it to run. Does it make sense to implement something like this first if I don't need sandboxing or will it annoy users?

My opinion is that installing a zig package should be safe in all conditions. I think we can make a list of acceptable operations for a package and create a whitelist. A whitelist also add the ability to ask the user. For example, http.get in a build.zig would prompt the user with something like package xxx is trying to download https://... allow? (yes/No), or instead we can deny it by default and add the package user to add an url whitelist to the build.zon.

I feel pretty strongly about package installation being safe, with all the CI we have and the "power" our dev machines often have, many developers in small companies can shutdown their company IT with the correct SSH command. I think we have the opportunity to do it right and we should.

ikskuh commented 1 year ago

@kuon most of that is actually solved by using WASI, as our runner can have a pretty tight integration into what APIs are allowed and on what directories/namespaces/... they are okay

As Zig already ships a really basic WASI runner, i guess it shouldn't be that hard to actually make a wasm interpreter that ships with Zig that can run our build scripts.

Considering the progress of the selfhosted wasm backend, that would also give us basically instant-reponse build time.

mattnite commented 1 year ago

As I see it, the purpose of this ticket is to protect the dev machine or CI server from a compromised package. The result of the build could be infected, but preventing that is not what we're doing here, we're protecting the build procedure itself.

@MasterQ32 has a point in that sandboxing build.zig and outputting a dependency tree eliminates a number of useful and elegant solutions that build.zig affords us. He also makes the comment that these build steps would have to be done in their own executable in that case.

If a dependency's build.zig can spawn a process or cause another program spawn a process, and it runs on the host (not sandboxed), then an attacker needs only infect the code for that executable to sidestep sandboxing.

We could sandbox make() like you mentioned @andrewrk and I think there are some interesting lines of thought down this path:

Policy

Creating policy for capability based systems like this is hard. We have a pretty low-level API for the boundary of the sandbox, and this is the layer at which policy is enforced. So how do we help protect developers while letting them get on with their day?

One thing to keep in mind is that many developers will have pure Zig dependencies. Many C projects will be able to set up their artifacts without reaching into the system. In these cases we wouldn't bump into the sandbox boundary.

By default we could allow a build.zig to access its project directory and network access would be disallowed. This might also cover a large number of projects meaning that many developers will be far safer with zero impact on UX.

From there a developer could modify the root project's allowlist if they did legitimately need a network request. We could even limit where that build script could make connections to. We could fail a build if a connect attempt is made, or provide prompts as @kuon outlined.

Capabilities could be limited to a single dependency so that different build.zig have different "permissions".

The policy could also be part of an application. So the act of managing what dependency capabilities could be kept to when someone updates packages. A project would declare what dependency can do what, and for someone cloning a fresh repo, they'd be able to build the project without having to understand the security requirements of the dependencies of a project. This goes for the CI server scenario as well.

Profiling

Instead of a long line of prompts, another solution could be to run a build in "permissive" mode, which lets it do whatever it wants, and then the compiler takes note of all the activity generated by the child processes, and sets the capabilities automatically. This is a form of trust-on-first-use.

kuon commented 1 year ago

@MasterQ32 Yes using WASI is functionally equivalent to traverse the AST and whitelist statements. I do not know were we are with a WASM runner, and I looked at wasmer, saw how huge it was, and thought it could be simpler to just "monkey patch" the AST than having a full WASM runner with a WASI implementation.

ikskuh commented 1 year ago

I guess we can start working on base of https://github.com/ziglang/zig/tree/master/stage1

But also check out https://github.com/wasm3/wasm3 for a pure C implementation that is tailored for perf. I guess implementing a small one in Zig that isn't tailored for max-perf but for maintainability. Builds scripts aren't super complex programs anyways, also have very short runtimes

kuon commented 1 year ago

@MasterQ32 that is still a major work. In the end it would be nice to have, but is it a priority for the package manager we want to ship? Having a single "AST editing" function would be much easier and faster to write. I think it really depends on project priorities. Technically yes, using WASM and WASI is the forward way of doing it and having a pure ZIG WASM runner is an awesome, but it could delay this feature for months.

ikskuh commented 1 year ago

What AST editing to you want to do? It's not like Zig has a true set of functions you are allowed to use/not use. On linux, it's enought to allow a "open" syscal to fuck up basically the whole system, but we also need the open systemcall to actually read files like build config files or similar. The WASI approach has a smaller attack surface, as the interface is well defined and the same for each OS. Having an AST whitelisting sounds like it wouldn't really scale. You have to allow importing std and after that you have to either perform SEMA or have to solve the halting problem in order to make sure only whitelisted functions are used:

const magic = try @field(@field(@import("std"), "net"), "tcp" ++ "Connect"++"To"++"Address")(…);

As we're trying to defend us against malicious code here, i wouldn't trust an AST editing scheme, but i'd love to see how the above case would be resolved with that

ikskuh commented 1 year ago

Alternative proposal to using WASM: Use a riscv32 emulator and use that to run the code. RV32-IMF is small enough to be interpreted easily, and you can make your custom "OS"/"syscall" layer by having a custom instruction for that stuff that can be emitted via inline assembly.

kuon commented 1 year ago

@MasterQ32 Well, to be honnest I do not know enough of zig internals to find the proper solution. In my project it was easy as it was elixir AST which doesn't have "escape hatch" that are not direct function calls. I mentioned it as someone intimate with zig AST might be inspired, but it was just a suggestion. For the above example I would simply change import("std") to import("std-safe") and also forbid any direct syscall and asm call in the AST unit (the build.zig file or any imported file).

mattnite commented 1 year ago

@kuon It is a major work to have a WASM runner in the project, but I believe the value for the users is worth it. We also have the talent in the community. We were able to whip up a minimal WASM interpreter in C for the compiler without much hassle.

I personally don't want a second standard library that I have to know its limitations. The other issue with the AST route is that we stuff an immense amount of complexity into statically analyzing code, it's likely more work to do this route, and it's a shotgun approach meaning we'll never stop patching it, and a smart attacker will find the flaws.

@MasterQ32 The issue with the riscv32 route is that we'd be building our own sandboxing features onto of the instruction set, this is not going to be less work than just writing a WASM interpreter.

kuon commented 1 year ago

@mattnite Yeah I get your point. As I said, I mentioned the AST "monkey patching" because I did it once, but it was with a completely different language in a different scenario. I think we are all convinced that the WASM/WASI road is the best way to go technically. I am scared by the amount of work it represents, but as you said, there is talent in the zig community.

Thinking about it, a WASM runner as a zig library would be an very valuable asset, imagine a game engine where mods can be written in any language and are compiled to WASM, the game engine then use the WASM runner to execute it in a sandbox. And zig being zig, it could be easily embedded in many projects, a bit like how lua is used. It opens a lot of possibilities.

mattnite commented 1 year ago

I've put this repo together if anyone wants to fork and prototype some ideas: https://github.com/mattnite/wasm-sandbox

vendored wasm3 and hooked it up to the build.zig so is should be pretty hackable.

rdunnington commented 1 year ago

I'm wondering if the zig project is open to taking a dependency on a third party library or if this proposal includes the idea of selfhosting one. Obviously there are the non-zig libs like wasmtime, wasm3, etc. I also know of two WASM runtimes for zig right now, zware and bytebox (my personal project). I'm in-progress adding WASI support to bytebox - wasi-testsuite is 100% passing for Windows, and I just need to flesh out support for the other 2 officially supported platforms, macos and linux. I'd definitely be interested in talking more about integrating bytebox with the zig project if that was the direction the core team wanted to go.

silversquirl commented 1 year ago

There are some issues I see with depending on third party libs, though the issue varies depending on the language it's written in:

The only language that seems reasonable is C, which pretty much limits the choices to just wasm3.

silversquirl commented 1 year ago

(I'd also like to note that, as Andrew jokingly hints towards in the original post, he has already written a wasm interpreter with WASI support, in Zig, as part of the investigation into using wasm for self-hosting :)

rdunnington commented 1 year ago

I see, I didn't realize there was already a zig implementation with partial WASI support that was part of the zig project. Your point about the zig compiler dependening on 3rd party zig projects makes sense. Save that, I was thinking zig could either absorb or reuse the zware or bytebox codebases, but it seems the idea is to provide a more targeted implementation that only addresses the needs of the compiler and build system. Doing that will probably be better in the long run anyway since there will be less attack surface area to harden. Thanks for responding. :)

eLeCtrOssSnake commented 1 year ago

I pretty much agree with @MasterQ32. Executing build scripts natively on the system sounds spooky, and sandboxing does sound nice but... If we limit the ability of build.zig then it will become another CMake/Meson, no? Yes, sure building with a makefile or cmake is safe, it just invokes the compiler. But now that I think about it - If the build.zig script is malicious, doesn't it mean that the whole package is malicious? Because in the end we will run the software we built(or somebody will!), and if the build.zig was malicious, probably so was the whole package? Now to me it seems that sandboxing build process isn't going to do anything in the case of malicious/compromised package.

So is it really worth sandboxing build.zig? If something got compromised, it's already done for.

silversquirl commented 1 year ago

I think the idea is more to prevent accidental damage than malicious code. The build system can always do whatever it wants by using a RunStep.

ikskuh commented 1 year ago

There are some issues I see with depending on third party libs, though the issue varies depending on the language it's written in:

There's also the fourth option: Upstream the wasm runtime to zig, and integrate it into std.

thezealousfool commented 1 year ago

I am not an expert on WASM and WASI but if I understand correctly building inside a WebAssembly sandbox will limit us to programs that were built with the WASM+WASI target. So if I want to run a build step that calls another program to pre-process some input - say convert markdown files to html - then the program being called (markdown converter) needs to be compiled to wasm+wasi. This might be limiting and should, I believe, come with an escape route for when we are not able to get a wasm+wasi binary for everything.

Containerized builds might be worth considering on Linux as it will not require special binary format and achieve all the security requirements but might be tricky to outright impossible on other systems that are not too friendly to containers.

ScottFreeCode commented 1 year ago

A simple alternative to a Web Assembly host for sandboxing a script: Pledge – I know, it would probably never work on Windows… or would it? Maybe it could call the WSL? (And I have no idea about MacOS support.) But it's a cool approach: just tell the OS, "I'm gonna run this script but don't let it actually access anything but this package's files." Compare also Landlocked Make

daurnimator commented 1 year ago

An alternative I proposed a while ago was that we could require that parts of build.zig are executed at comptime: this is already effectively a sandbox, where arbitrary syscalls are not possible, yet you're still able to do e.g. @embedFile to read in files the user's config might ask for, or even @cImport to read a system header to check what version of an include header you may have.

ScottFreeCode commented 1 year ago

Comptime as much as possible + pledge the rest on OSes that have it, would be fantastically safe and lightweight!

presentfactory commented 1 year ago

True isolation is never going to be possible if you want Zig to be usable. Build scripts need to be able to invoke processes on the host machine, read/write from files etc. This is pretty much essential for say a game's asset pipeline which is useful to invoke via the build script (unless you're going to force people to compile a program and then just run that to do the same thing which is a totally useless addition of complexity of things).

Additionally isolation of the build script does little to mitigate actual malicious code in the library/executable itself. If one can just add code to a library and have it injected into what you are building then isolating the build script doesn't really accomplish that much.

The reality is that code (be it build script or the source for the binary) on a system is going to be potentially dangerous, the solution isn't to lock it down but instead to have people actually audit the code they are using in their projects. What I'd do is simply remove the package manager because package managers allow people to build/compile code on their system without auditing it which has led to many security issues time and time again. Languages like JS and Python have had this issue multiple times thanks to npm and pip, languages like C rarely do because C does not have a package manager and forces people to audit stuff more. The package manager is the problem here, do not make the language suffer because of its issues and instead find a better way to handle that.

In lieu of being able to audit things people should be using actual dedicated things for isolating build processes and the results of execution from the system with VM containers as such things will also protect from bugs in Zig's compiler being exploited to run code and etc. It's really not feasible for Zig to re-implement docker essentially so I do not think it's worth investing so much into that line of thought.

BratishkaErik commented 1 year ago

In lieu of being able to audit things people should be using actual dedicated things for isolating build processes and the results of execution from the system with VM containers as such things will also protect from bugs in Zig's compiler being exploited to run code and etc. It's really not feasible for Zig to re-implement docker essentially so I do not think it's worth investing so much into that line of thought.

There are already programs like Debian's fakeroot with/or Gentoo's sandbox but Zig' project core team wants to rely on as less external dependencies as possible (recent examples are resinator and arocc), so...

ScottFreeCode commented 1 year ago

The other thinq about saying "but you have to audit the code anyway so this isn't any more or less secure", is that a build system vulnerability is a vulnerability in a different place from an end code vulnerability.

Running untrusted code, say, in a CI pipeline as part of a pull process potentially before the code is reviewed, is different from running the built code on a customer's machine hopefully after the code has been reviewed. It's different not only because you might in fact have an automated system running the code before the audit, but even just because different things are vulnerable if untrusted code is running on a CI server (and whose, a cloud provider or your own) versus on a developer machine (when initially working on the code, even if only trying it out) versus on the end user's machine. No, you don't want any of them to be vulnerable and yes, you need to audit the code. But as part of defense in depth you'd want to mitigate audit oversights, and the mitigation is not equivalent in different places because "they could still be vulnerable over there" doesn't mean the same thing as "we'll be vulnerable over here."

(All of this is less obvious in some environments like Node/NPM where code may never actually be built and the equivalent could be install-time commands, which happen wherever the code is pulled down and run. Because in that case we're talking about something that potentially does run in the same place that the code itself does. In other words, that's different from a build system.)

ScottFreeCode commented 1 year ago

I'll also plug Pledge and Landlocked Make again (see my other comments upthread) as, apropos of the build vulnerability run vulnerability issue, Pledge* can be used both at build time and at runtime, and can act like a sandbox without the overhead of a VM or container, and if you audit the pledges you can reduce the need to audit the code itself, or if the pledges are controlled by someone trusted then they can sandbox untrusted code, etc.

I would love for languages and build systems and distros all to work towards a future where everything running on Linux should be able to provide a manifest of what it needs access to, the kernel will enforce that access, and the rest of us just need to pay attention to those manifests. We have the technology. We just have to integrate it.

*Along with unveil: the same developer ported both to Linux.

KilianHanich commented 8 months ago

Since build.zig creates a dependency tree and doesn't do the build itself, I am not sure this actually applies here, but I'm going to leave this here:

A lot of people don't just use the buildsystem to "build their project" in the traditional sense (as in "compile the code"), but also in the extended sense. Meaning everything which has to get from source to working product.

That can also include customs steps to e.g. flash a microcontroller with the executable (obviously done by calling external tools, but can also be done by a tools which was also compiled by the project).

silversquirl commented 8 months ago

@KilianHanich That usecase is totally fine and won't be affected by this proposal. You build the flashing tool, then run it with a run step as normal.