smol-rs / async-fs

Async filesystem primitives
Apache License 2.0
130 stars 8 forks source link

Document that dropping a async_fs::File with buffered writes may not immediately close file on drop #14

Open erickt opened 2 years ago

erickt commented 2 years ago

I just found an unexpected use case that would be worthwhile documenting here, and in blocking. As best as I can tell, if a user does:

let mut file = async_fs::File::open("some-file").await?;
file.write_all(b"some bytes").await?;
drop(file)

The underlying std::fs::File is not necessarily dropped when we drop the async_fs::File. This can happen because of how blocking implements AsyncWrite. Whenever it starts out in the Idle state, it creates an async pipe, and submits the (read_pipe, real_file) to blocking's background thread pool. Any subsequent writes are written into the write_pipe that's eventually consumed by the background task. If there's a large number of outstanding writes, then it might take a decent amount of time before the underlying std::fs::File is actually dropped.

The fix for this is to explicitly a function that ultimately calls Unblock::poll_stop, like async_fs::File::close, or async_fs::File::poll_flush.

We ended up running into this in Fuchsia when downloading a large number of files. While we thought we had sufficient controls around the number of concurrently written files by controlling when we dropped async_fs::File, but some users were hitting the "too many opened files" error because by default OSX's ulimit -n is only 256.

Would it be possible to update the docs for async_fs::File to mention that users should run flush/sync_data/sync_all/close/etc if they write data, otherwise the underlying file might not get closed until some indeterminate point in the future?

taiki-e commented 2 years ago

It makes sense to mention that flush is needed. BufWriter also has similar docs.

erickt commented 2 years ago

Yeah, I think if async_fs::File had language similar to that BufWriter that's a little more explicit about how async_fs::File is implemented. tokio::fs::File's docs also directly speak to this issue.

I suppose another way to approach this is we could try to get rid of the decoupling between file.write(buf).await completion and when the underlying write syscall is called. We could associate a futures::channel::oneshot with each write, which async_fs::File::write could await on, but that'd probably have some impact on our throughput.

Or maybe something clever could be done using the recently stabilized std::thread::scope to get rid of the pipe? That'd probably be a pretty big redesign though.

taiki-e commented 2 years ago

I have not investigated the details regarding alternatives, but at this point I think it is reasonable to improve the docs.

As for std scope threads, it is usually incompatible with asynchronous code because it blocks the current thread until all spawned threads have been completed.