wasmerio / wasmer

🚀 The leading Wasm Runtime supporting WASIX and WASI
https://wasmer.io
MIT License
18.99k stars 813 forks source link

Listing a directory mounted in memfs shows the original path #3685

Open Michael-F-Bryan opened 1 year ago

Michael-F-Bryan commented 1 year ago

Describe the bug

As part of #3650, I noticed that when you mount a folder from one filesystem onto a folder in the wasmer_vfs::mem_fs::FileSystem, running read_dir() will give you paths from the original file system.

Steps to reproduce

This test should pass:

#[test]
fn incorrect_file_names() {
    let temp = TempDir::new().unwrap();
    std::fs::write(temp.path().join("file.txt"), b"Hello, World!").unwrap();
    let host: Arc<dyn FileSystem + Send + Sync> = Arc::new(host_fs::FileSystem);

    let fs = wasmer_vfs::mem_fs::FileSystem::default();
    fs.create_dir("/nested".as_ref()).unwrap();
    fs.mount("/nested/folder/".into(), &host, temp.path().to_path_buf())
        .unwrap();

    let contents: Vec<_> = fs
        .read_dir("/nested/folder/".as_ref())
        .unwrap()
        .filter_map(|result| result.ok())
        .map(|entry| entry.path)
        .collect();
    assert_eq!(contents, vec![PathBuf::from("/nested/folder/file.txt")]);
}

Actual behavior

It fails with the following:

thread 'runners::wasi::tests::incorrect_file_names' panicked at 'assertion failed: `(left == right)`
  left: `["/tmp/.tmpzrN5Ld/file.txt"]`,
 right: `["/nested/folder/file.txt"]`', lib/wasi/src/runners/wasi.rs:335:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Michael-F-Bryan commented 1 year ago

This bug gets especially fun because it lets you produce a filesystem that can't be traversed.

fn walk(fs: &dyn FileSystem, path: &Path, depth: usize) {
    for result in fs.read_dir(path).unwrap() {
        let entry = result.unwrap();

        let file_type = entry.file_type().unwrap();
        println!("{}{}", "  ".repeat(depth), entry.path.display());

        if file_type.is_dir() {
            walk(fs, &entry.path, depth + 1);
        }
    }
}

#[test]
fn unwalkable_filesystem() {
    let host: Arc<dyn FileSystem + Send + Sync> = Arc::new(host_fs::FileSystem);
    let fs = crate::mem_fs::FileSystem::default();
    fs.create_dir("/nested".as_ref()).unwrap();
    fs.mount("/nested/host/".into(), &host, PathBuf::from("."))
        .unwrap();

    walk(&fs, Path::new("/"), 0);
}

This code will panic with the following:

---- tests::unwalkable_filesystem stdout ----
/nested
 /nested/host
  ./Cargo.toml
  ./src
thread 'tests::unwalkable_filesystem' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidInput', lib/vfs/src/lib.rs:664:41
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
syrusakbary commented 1 year ago

I think this issues are arising because we are using mount inside of a Filesystem, instead of moving mounting outside.

I think with UnionFS and so on we move the logic outside so it's easier to separate the local paths mounted, from the real ones underneath

syrusakbary commented 1 year ago

How I would solve it?

Let's say you have a UnionFS({ "/a": HostFS("/the/real/path") }). When you read over /a, you will be reading over "/" inside of the Host fs, which maps to /the/real/path (same if you do /a/b -> It will be /b which maps to /the/real/path/b).

I feel that will solve most of the issues?

Basically, we shouldn't have:

    let fs = wasmer_vfs::mem_fs::FileSystem::default();
    fs.create_dir("/nested".as_ref()).unwrap();
    fs.mount("/nested/folder/".into(), &host, temp.path().to_path_buf())
        .unwrap();

Basically, the building blocks should be different:

host = HostFs("/the/local/path")
host.read_dir_all("/") // This read the dir contents of "/the/local/path", (the DirEntry::path(), should have removed the trailing "/the/local/path")
host.read_dir_all("/x/y") // This read the dir contents of "/the/local/path/x/y" but with paths such as `/x/y/...` (without the /the/local/path/)
union = UnionFs::create("/nested/folder/" => host)
union.read_dir_all("/nested/folder") // This is equivalent to host.read_dir_all("/"), but DirEntry::path() will be modified by the union, prepending "/nested/folder") to it
union.read_dir_all("/nested/folder/x/y") // This is equivalent to host.read_dir_all("/x/y"), but DirEntry::path() will be modified by the union, prepending "/nested/folder") to it

Note:

union = UnionFs::create("/nested/folder/" => host) // should be equivalent to (doing the following under the hood):
union = UnionFs::create("/nested" => UnionFs::create("/folder", host)) // This make things waaaay simpler since it prevents collisions automatically

And then, all composed:

OverlayFs(
    MemFs::new(),
    union
)

That way, it becomes very clear what holds what. And filesystems can keep being composed, but they are not aware of where they are mounted in the parent... and that makes things way cleaner and easier to reason about

syrusakbary commented 6 months ago

This was partially/fully resolved by the #4298 .