smol-rs / async-fs

Async filesystem primitives
Apache License 2.0
131 stars 8 forks source link

File::from() with pipes not working? #4

Closed nazar-pc closed 4 years ago

nazar-pc commented 4 years ago

I have faced an issue when doing async_fs::File::from(File::from_raw_fd(fd)), where fd is a pipe.

The issue is that in non-blocking mode I'm getting errors about broken pipe in child worker process that is trying to write to the pipe like uv_write(): broken pipe: https://users.rust-lang.org/t/piping-file-descriptors-beyond-std-io-out-err-to-child-process/48089

There doesn't seem to be such issue when using blocking mode with child threads.

Is this related to async-io's note that is?:

NOTE: Do not use this type with File, Stdin, Stdout, or Stderr because all of the supported operating systems have issues with them when put in non-blocking mode.

I've prepared reduced example that can be run with:

git clone https://github.com/nazar-pc/mediasoup.git
cd mediasoup
git checkout rust-reduced
npm i
cd rust
RUST_LOG=mediasoup=debug cargo run

The issue happens when I'm doing reads on async_fs::File. Same reads with blocking code works fine, but requires a separate thread.

ghost commented 4 years ago

Thanks for the bug report! I was able to reproduce, identify the issue, and fix it.

async-fs does not use async-io. Instead, it uses the blocking crate to simply offload blocking I/O operation onto a threadpool and hide that fact behind an async API. This is because operating systems have poor support for truly async file I/O. IOCP and AIO are kind of broken (they sometimes block), and io_uring is still very much new.

The understand the problem, note that blocking reads the file on the threadpool in advance, as much as it can, storing that data into an internal buffer. That means if you do a read operation followed by a seek operation, the seek might be off because the blocking crate decided to read more data from the fd than was actually consumed from the perspective of async_fs::File. This is why async_fs::File keeps track of how much data it has read, so if a read operation is followed by a write or a seek operation, it will first correct the actual file cursor.

The catch is in that not all files are seekable. Here, you're using pipes that are not seekable so whenever async_fs::File wanted to "remember" the file cursor position before reading data from it, it would blow up with an error that says "Illegal seek". Then, that would case the thread to exit with that same error, thus dropping the async_fs::File, which is why "broken pipe" ends up being reported.

The solution is to ignore seek errors and accept the fact that some files simply aren't seekable, so in that case we should not try doing anything to correct the file cursor: #5. I have published a new version so if you do cargo update, everything should work fine.

Finally, one more thing. :) The async_io::Async type actually does work with some kinds of Unix "files" (because everything is a file on Unix systems). It doesn't work with actual files on the filesystem, but it does work with pipes because pipes are fully supported by epoll. In fact, I tried replacing async_fs::File in mediasoup with async_io::Async<StdFile> and it worked fine! Maybe I should clarify that in the docs...