rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.7k stars 12.5k forks source link

Suggestion: treat WASI preopen dirs as path prefixes #82339

Open RReverser opened 3 years ago

RReverser commented 3 years ago

Rust stdlib already has a concept of path prefixes (std::path::Prefix, std::path::PrefixComponent etc.), although they are applied only to Windows paths.

WASI paths are currently treated as starting with a root / (e.g. path.components() returns each level starting from Component::RootDir aka /), even though WASI recognises only paths starting with a specific prefix that matches one of the preopen directories specified at startup. The levels above those prefixes don't "exist", and most APIs under WASI don't permit crossing boundaries and resolving relative paths or symlinks from a path rooted under one preopen directory to another.

The question of exposing an API corresponding to the (currently private) open_parent arisen several times in the past, as it's important to split the preopen dir from the relative path whenever dealing with lower-level WASI syscalls.

Given the above, I feel like there is a lot of conceptual similarity between WASI preopen dirs and the std::path::Prefix, and the latter would be a natural API to use for exposing the former. We'd only need to add another variant to std::path::Prefix as well as check against the preopened dirs when splitting Path into components.

@alexcrichton @sunfishcode would love your thoughts on this idea.

sunfishcode commented 3 years ago

My initial thought is, the preopen path mechanism is a libc abstraction, so instead of exposing more details about how it works, we should ideally either extend the abstraction so that applications don't need to know about it, or give applications better ways to skip the abstraction and work in terms of WASI concepts directly.

If code is using preopens and finding things that don't work, can we identify features we could add to support such code? You mention crossing boundaries, resolving relative paths, and symlinks; could you say more about what things you're looking to do that don't work?

For application code that wants to take WASI into consideration, one possible approach is to pursue the WASI port (currently under development) of cap-std. That way, instead of thinking about paths and prefixes, which are still just libc abstractions, code can use Dirs, which reflect how the underlying wasi-filesystem APIs work.

Do either of those approaches sound feasible here?

RReverser commented 3 years ago

could you say more about what things you're looking to do that don't work?

The latest example of such conversation was when I was adding std::os::wasi::fs::symlink_path. That particular case is addressed now but @alexcrichton said:

My personal thoughts on this issue haven't really changed since last I wrote which is that I think the best option for improving the state of things is to expose a function which, given a path, returns a file descriptor and a relative path.

So I thought we might want to still expose some API that allows such split-up.

More generally, I'm also worried about Path / PathBuf consumers which perform arbitrary manipulations via .pop() / .push() / .join() and might accidentally cross boundaries between preopened dirs. I guess the question is, do we want to allow that?

All the libc APIs in wasi-libc currently don't allow such crossings and you can't perform e.g. fopen("../otherdir/somefile.txt", "r") while the current working dir is in /somedir. I'm assuming this is intentional and part of the WASI security model, and, if it is, I think it's worthwhile to prohibit this in the userland path manipulation as well, at least to prevent accidental boundary crossings.

Treating preopen dir as a PrefixComponent rather than a RootDir + a series of nested Normal components would do just that.

sunfishcode commented 3 years ago

I actually think the security model is ok with allowing applications to .pop() and .push(), or to use .., from one preopen to another. I expect we could implement it all in libc, and as such WASI itself won't be affected.

My main question is how this impacts the ABI, and to what it extent it creates conventions that applications may come to depend on.

One of the things I'm interested in for WASI is blinding for host paths. That is, if I pass --dir=/home/sunfish/data to expose a directory to an application, in many cases the application shouldn't ever care about the path itself, except for the ability to resolve the strings that I pass it that start with /home/sunfish/data. In such cases, a wasm engine could theoretically present the preopen to the application as /blind/a74ac718, and rewrite command-line arguments and environment variables starting with /home/sunfish/data to start with /blind/a74ac718 instead, and in many cases everything would still work. This way I don't need to worry about whether the application is peeking at my username or the organization of my local filesystem!

If applications come to depend on having preopen paths living at known locations relative to other preopen paths in the hierarchical namespace, it would complicate this kind of blinding, limiting our options to evolve in this direction in the future.

That said, host path blinding is a theoretical feature, and not necessarily a requirement. If pop() and .. out of preopen turns out to be important, maybe it's better to just support them. But are they important? What would be a situation where an application would want to peer upwards in the hierarchical namespace without knowing in advance what it might find there?

RReverser commented 3 years ago

Isn't the host blinding you describe essentially same as --mapdir which is already supported? (except for the magic of rewriting command-line args)

If pop() and .. out of preopen turns out to be important, maybe it's better to just support them.

I guess if it's not part of the security model, that would work too. I just want userland and system APIs to be consistent in path handling.

But I thought I saw your comments on symlinks (or was it just hardlinks?) where you said it wasn't desirable to allow crossings from one preopen to another. Is that no longer the case?

What would be a situation where an application would want to peer upwards in the hierarchical namespace without knowing in advance what it might find there?

Just to provide one example (since that's the one I'm most familiar with) - cases of "shells" like https://wasi.rreverser.com/.

By default, there's a mounted dir /sandbox. If user also mounts a dir, say, at /somedir, then, counter-intuitively, things like this don't work in the same way:

/somedir$ ls /sandbox
temp
/somedir$ ls ../sandbox
ls: error: '../sandbox': No such file or directory
Exit code: 1

In principle, it's possible to solve in userland code of a particular app, but I'd prefer to be consistent with rest of WASI - if WASI doesn't allow relative paths from one mount to another, then app shouldn't either, and vice versa.

This applies not only to shells, but also to e.g. any manipulations that produce intermediate relative paths someRelativePath = relative(from, to), and then try to join them back together into an absolute one to2 = join(from, someRelativePath), and the result is no longer equivalent to the original and might be not accessible. This can result in confusion and obscure bugs.

sunfishcode commented 3 years ago

Yeah, --mapdir is a step toward blinding. It's not yet automatic though. And it's not yet clear if the ecosystem will evolve in a way that's compatible with automatic blinding.

But I thought I saw your comments on symlinks (or was it just hardlinks?) where you said it wasn't desirable to allow crossings from one preopen to another. Is that no longer the case?

That's still mostly the case. Symlinks are resolved by the "OS", rather than libc, so they don't go through the preopen mechanism. If we really wanted to support cross-handle symlinks, there are a few ways we could make it work. I think we're still in the phase of looking for the real-world use cases that would help shape this.

Just to provide one example (since that's the one I'm most familiar with) - cases of "shells" like https://wasi.rreverser.com/.

Shells are special :-). An assumption of applying capability model to filesystems is that most applications and libraries will only need limited access to the hierarchical namespace, but shells tend to want to be able to roam freely throughout the namespace. Also, shell users may type .. not because of some fundamental relative positioning of two resources in the namepace, but often just because they know where their data is and that's the shortest thing to type to get to it. I expect that once interface types gives us better ways to manage handles, we'll be able to give shells better support. Beyond that, I imagine that shells aren't representative of what regular applications and libraries need.

This applies not only to shells, but also to e.g. any manipulations that produce intermediate relative paths someRelativePath = relative(from, to), and then try to join them back together into an absolute one to2 = join(from, someRelativePath), and the result is no longer equivalent to the original and might be not accessible. This can result in confusion and obscure bugs.

Some of this kind of thing works today, if we ignore .. paths and symlinks. If programs are manipulating pure paths, they're not tied to a particular preopen, so they work just like normal paths, and then the preopen decision happens when the path is actually used.

RReverser commented 3 years ago

if we ignore ..

Right, but we can't really ignore those :) A function to get relative path between from to to is most often bound to produce some .. components.

The rest of what you're describing makes sense, but I hope that this kind of usecases is not ignored completely just because they're special :)

The general question about the direction of path handling still stands I think: do we want Rust path manipulation API to match what libc does today, and treat preopen dir as a "root" that can't be crossed via normal Path APIs, or do we want to change libc to support resolving paths across those preopen dirs.

Regardless of the specific usecase, the current mismatch in behaviour seems tricky to navigate.

RReverser commented 3 years ago

The general question about the direction of path handling still stands I think

Another reason I'm asking is that I want to implement Path::canonicalize on WASI because it frequently comes up during cross-compilation, and then fails at runtime - turns out, a lot of crates use it to translate relative paths to absolute ones.

I have partial implementation (modulo symlinks) in a branch, but for now I've restricted it to return error on any attempt to cross boundaries, which should be still good enough for most use-cases, but if we do want to allow such "crossings", then the implementation can be quite a bit simpler.

sunfishcode commented 3 years ago

These are good questions, and I myself don't have a complete vision for how this should all work :-}. Big-picture WASI wants to use handles instead of paths, so path support is all about finding the right balance between compatibility for porting code and preserving desirable properties of the capability system where possible.

A canonicalize implementation that is good enough for most use cases but conservative about "crossings" sounds like a good place to start. If it doesn't turn out to be enough, then hopefully we'll learn about the cases that need more.

I've been thinking more Path prefixes. It's an interesting idea. I assume the determination of what's a "prefix" depends on the preopen set, so if for example we had a way to add new preopens dynamically, that would mean that Path parsing would depend on runtime changes, which feels like a departure from how Path works on other platforms.

RReverser commented 3 years ago

I assume the determination of what's a "prefix" depends on the preopen set, so if for example we had a way to add new preopens dynamically, that would mean that Path parsing would depend on runtime changes, which feels like a departure from how Path works on other platforms.

Yeah, totally - for now my suggestion above was based on the assumption that preopens will remain static, but if we add ability to add them dynamically, such behaviour might become more confusing indeed.

RReverser commented 3 years ago

A canonicalize implementation that is good enough for most use cases but conservative about "crossings" sounds like a good place to start. If it doesn't turn out to be enough, then hopefully we'll learn about the cases that need more.

Sounds reasonable to me. Worst case we'll hear from people for whom it's not enough. Thanks.