Open jgoerzen opened 2 years ago
If - assuming we're talking about regular files here - you need durability then relying on close
is the wrong thing because data may only make it to the page cache and later encounter an error during writeback. You should rely on fsync
instead.
let file = File::from(owned_fd);
file.sync_all()?;
Of course that's expensive since it forces the OS to do immediate writeback, so it should only be done when durability is critical.
I think it makes sense to have a close method for OwnedFd
, OwnedHandle
, etc if only for diagnostic purposes. There's not a whole bunch you can do with the error if you get in that situation, other than logging or maybe restarting the whole operation over again. But maybe it can help diagnose a problem.
The API proposal is just drop
but doesn't swallow the error, right?
impl OwnedFd {
fn close(self) -> io::Result<()>;
}
@the8472 close(2) on Linux even states:
A successful close does not guarantee that the data has been success‐
fully saved to disk, as the kernel uses the buffer cache to defer
writes. Typically, filesystems do not flush buffers when a file is
closed. If you need to be sure that the data is physically stored on
the underlying disk, use fsync(2). (It will depend on the disk hardware
at this point.)
But it also includes the comment I quoted above about the importance of still checking the return value of close().
@ChrisDenton A scenario in which an error here may be relevant could be this:
Let's say you are processing items from a queue of some sort. If processing is successful, you remove the item from the queue. If not, it remains on the queue. An error from close would be a reason to leave the item on the queue, or to signal failure to the network client, etc.
I don't know the particulars of the NFS issue, but I can imagine them. I also imagine various FUSE filesystems (eg, s3fs) do the bulk of their work only when close is called.
Another scenario I could imagine would involve pipes. If a prior write() put data into the pipe buffer, but the process on the other end of the pipe crashed before reading it, the only way to detect this would be via SIGPIPE (which is blocked by default in Rust; see #97889) or, I would imagine, by the return value of close(). (I have not personally verified this)
@ChrisDenton Yes, you are correct about my proposal. It should consume and cause the dropping of the OwnedFd because, as the close(2) manpage notes, you really shouldn't attempt to close it again, even on error (with a side asterisk that certain Unices, such as HP/UX, you SHOULD close it after EINTR, but this is ambiguous in POSIX and is not the case on Linux at least; I'd argue that if somebody ports Rust to one of those OSs, EINTR handling should be internal to fn close there.)
I would also like to see it bubbled up to the higher-level interfaces (File, sockets, etc). This shouldn't be difficult to use in cross-platform code, I'd hope. I don't know if Windows has the same close() semantics, but even if on Windows it is identical to drop, having an interface available would be helpful.
The one trick I'd see would be preventing a double-close when there is a call to fn close (by drop).
But it also includes the comment I quoted above about the importance of still checking the return value of close().
My understanding is that fsync
subsumes all that as long as the system defines _POSIX_SYNCHRONIZED_IO
, which is the case on all major unix systems. E.g. postgres heavily relies on fsync for ACID guarantees, not close
.
Looking at the return value can only ever be a diagnostic thing. Unless you're opening a file with O_SYNC or O_DIRECT close doesn't guarantee anything due to writeback.
I don't know the particulars of the NFS issue, but I can imagine them. I also imagine various FUSE filesystems (eg, s3fs) do the bulk of their work only when close is called.
NFS supports fsync. Anything that ignores fsync is not a durable filesystem and you shouldn't entrust critical data to it.
Another scenario I could imagine would involve pipes. If a prior write() put data into the pipe buffer, but the process on the other end of the pipe crashed before reading it, the only way to detect this would be via SIGPIPE (which is blocked by default in Rust; see https://github.com/rust-lang/rust/issues/97889) or, I would imagine, by the return value of close(). (I have not personally verified this)
Close has no special treatment for pipes. Any data still stuck in a pipe is lost.
I also imagine various FUSE filesystems (eg, s3fs) do the bulk of their work only when close is called.
FUSE for its part documents that filesystems should avoid doing this.
@the8472 and @sunfishcode I would like to understand -
1) Are you folks opposed to having an explicit close() that returns an error?
2) Or are you fine with that, but dislike my justification?
If 1, how do you reconcile that with the strong admonishment in close(2) that "failing to check the return value when closing a file may lead to silent loss of data"?
If 2, I'd like to hear more.
Over at https://stackoverflow.com/questions/24477740/what-are-the-reasons-to-check-for-error-on-close there is a conversation about this which I found interesting. Among the points are:
"Consider the inverse of your question: "Under what situations can we guarantee that close will succeed?" The answer is: when you call it correctly, and when you know that the file system the file is on does not return errors from close in this OS and Kernel version"
It is this second that has me particularly troubled. On my Debian Linux box, there are 58 in-kernel filesystem modules and dozens more implemented via FUSE. Some of them (eg, ext4) are specifically designed for POSIX semantics. Some of them (eg, cifs, exfat) were not. I use sshfs and NTFS via FUSE on a regular basis and who knows about them; I am certain they are not fully POSIX compliant. Even NFS has never been fully compliant with POSIX semantics, only getting somewhat better in more recent years.
I'm not arguing for everything to be perfect here; in a strict sense, if one wants perfect ACID, one would not just fsync, but also check the return value of fsync and close, and fsync the parent directory after a rename also. Except on MacOS, where fsync doesn't actually force data to durable storage, unlike Linux, where fsync explicitly does.
I just want the tools to be a careful programmer wherever possible, that's all.
This is probably a separate issue, but an incidental remark: I don't see any way to fsync a directory in std, either. File::open() won't open a directory (perhaps via O_DIRECTORY in custom_flags, but there is nothing in std that exposes O_DIRECTORY), and read_dir doesn't expose the fd for syncing either.
@the8472 and @sunfishcode I would like to understand -
1. Are you folks opposed to having an explicit close() that returns an error?
I myself am.
If 1, how do you reconcile that with the strong admonishment in close(2) that "failing to check the return value when closing a file may lead to silent loss of data"?
Ignoring errors from close
is so pervasive that filesystem developers that care about robustness are practically expected to make sure close
never fails.
Rust has ignored errors from close
since 1.0, which is a precedent across the ecosystem. If we change that, it will create a new expectation for almost all Rust libraries and utilities that write files. It's tempting to say "you don't need to check if your use case doesn't need it", but libraries and utilities usually don't pick the filesystems they run on, or how data is used outside their scope.
If we approach this issue as "Do we care about this kind of data loss?", it's difficult to explain saying no. But we can instead ask "Who is responsible for this kind of data loss?", and there are multiple options:
Declare that Rust libraries and utilities are now expected to check errors from close
.
Declare that end users using NFS are responsible for actively monitoring how much free space they have, and that FUSE filesystem authors are responsible for following the FUSE documentation.
Sometimes fixing existing code to correctly handle errors from close
is easy, but sometimes it's costly. Either way, the new code paths will be rarely exercised and difficult to test. There is an ecosystem-wide cost for adding this expectation.
To my knowledge, the main filesystems where close
does fail in practice are networked filesystems which postpone reporting errors in order to go faster (NFS calls this "async"), and FUSE filesystems which ignore the documentation. End users using NFS and caring about the data being written to it should probably already be actively monitoring for free space and other conditions, because of the amount of existing code out there that doesn't check for these errors. FUSE authors that care about protecting their users' data should already be following the documentation for flush
.
If NFS were as common as it once was, things might look different, but in practice, POSIX filesystem APIs, with all the guarantees and common assumptions, aren't a great fit for networks. Instead of exposing the async nature of networks to applications via mechanisms that could map to Rust's async
, NFS' "async" mode tries to hide everything behind the illusion of a synchronous API. Filesystem APIs can be convenient, but they have significant costs.
Neither option is without downsides. I don't have data, but I assume the costs of passing on the responsibility for errors from close
to individual developers throughout the Rust ecosystem would be greater than the cost of stating that this is not the ecosystem's responsibility, and passing the responsibility onto end users that choose to use NFS, and FUSE filesystem authors, who arguably already have this responsibility.
As a separate consideraion, stdout can be redirected to a file, and all the same data-loss scenarios are possible when writing to a file via stdout. If we add an expectation to Rust that code check for errors from close
after writing to files, we should add a way to close stdout
and check for errors from that too. But that's tricky, because Stdout::lock
returns a &'static
, which effectively guarantees that stdout is always open.
I just want the tools to be a careful programmer wherever possible, that's all.
This is why it's useful to make these decisions at a broad level, like Rust (or ideally higher, but one step at a time), so that we can collectively agree on what should be expected of individual careful programmers.
I also aspire to be a careful programmer, and I once added logic to LLVM to close
stdout and check for errors, because LLVM is often used to write to stdout with redirection to files. I too read the close
man page and believed it was my responsibility. That code lived in the LLVM codebase for years, but only with me defending it against a long series of people hitting problems with it and wanting to remove it. I eventually stopped defending it, and it has since been removed. It was just too much trouble to be worth the cost.
Are you folks opposed to having an explicit close() that returns an error? Or are you fine with that, but dislike my justification?
I'm ambivalent about adding the API. Generally it does not make sense to check for errors on close. But I can see some niche uses when hunting for errors or having a policy of failing as loudly and as explosively as possible then it might make sense to print an error and then abort when a close fails (what else are you going to do? there likely is no sane path to recovering from an error).
But I do dislike the justification. Checking for close is not what you do when you care about your data making it to storage intact. As I said earlier, fsync is the go-to mechanism here. To me it would be purely a diagnostic thing that something has gone horribly wrong.
File::open() won't open a directory
But it does? (playground)
Some of them (eg, ext4) are specifically designed for POSIX semantics. Some of them (eg, cifs, exfat) were not. I use sshfs and NTFS via FUSE on a regular basis and who knows about them; I am certain they are not fully POSIX compliant. Even NFS has never been fully compliant with POSIX semantics, only getting somewhat better in more recent years.
Sure but would you run a production database on a dodgy USB stick formatted with exfat?
Thank you for these thoughtful comments.
I definitely do not want to create a precedent to require libraries to check close(). You are right that this would be unworkable. Already I wouldn't necessarily trust libraries with writing (are they doing the POSIX-safe way to do atomic writes, with fsyncs and renames? Almost uniformly not, but then you may not always want that).
I want the option to be able to check it myself in a safe way, that's all.
There are a lot of machines out there that are running on filesystems that I wouldn't count to apply these guarantees, or have other errors that may be signaled at close. FreeBSD can return ECONNRESET for instance. An entire class of devices (Raspeberry Pis) typically runs on dodgy micro SDs. I particular bane of my existance for years has been VMs (and even hardware) that fails to propagate fsync. I /frequently/ work with software that has to deal with USB sticks formatted with exfat or NTFS (or, heck, vfat). While NFS no doubt isn't as prolific as it once was, it is still with us, and now we have a bunch of others in that range: afs, coda, glusterfs, ocfs2, ceph, cifs, davfs, sshfs, s3fs, rclone -- to name a few. We have a /proliferation/ of filesystems or filesystem-like things, actually. It is an explicit design goal of the program I was working on when I created this issue to be maximally compatible with filesystems that differ from POSIX semantics in significant ways; for instance, s3fs, in which a file created may not be able to be opened for reading right away due to eventual consistency.
I care about data integrity. I am realistic that checking the return value from close is not itself sufficient to gaurantee it. I don't believe there exists a perfect guarantee, in fact, because hardware isn't perfect. But I want to take every step I reasonably can, and which documentation has showed I ought to, along that path -- in part because I realize I am dealing with filesystems with a lot more failure modes than ext4 on a local SSD.
I wouldn't try to run PostgreSQL atop S3. But for Filespooler, written in Rust, this is an explicit design goal and documented practice. Although written in Go, a similar tool, NNCP, is something I use for getting data to airgapped backup & archival systems.
There seems to be some separate scenarios here
shutdown
on your end and then waiting for the other side to terminate the connection before you call close
then there shouldn't be anything left on the wire that could possibly cause close to fail.I definitely do not want to create a precedent to require libraries to check close(). You are right that this would be unworkable.
Ok. Then imo it should be documented in a way that it redirects users to sync_all
if they want durability and detect IO errors and basically discourage its behavior due to being non-portable and FS-dependent.
Prior discussions, sorted from oldest to newest:
Rust has ignored errors from close since 1.0, which is a precedent across the ecosystem. If we change that, it will create a new expectation for almost all Rust libraries and utilities that write files. It's tempting to say "you don't need to check if your use case doesn't need it", but libraries and utilities usually don't pick the filesystems they run on, or how data is used outside their scope.
Apparently we actually were debug-asserting on this since 1.0 and the only reason it's not come up is due to the precompiled stdlib thing.
The new issue is about closing ReadDir
, not for File
or OwnedFd
, they're different impls.
But as noted there we probably should assert on EBADF instead of ignoring errors since it indicates a likely io-safety violation.
While working on #98209 , I went searching for where close() is called. I found it in std/src/os/fd/owned.rs:
The Linux close(2) manpage states:
Anyhow, checking the return value of close(2) is necessary in a number of cases. But here we have a conundrum, because what can we possibly do with a failure of close(2) inside a Drop impl?
These all have their pros and cons. But aren't we looking for something more like this?
fn close(self) -> io::Result<()>
In fact, a
Close
trait could define this function and it could be implemented on files, pipes, sockets, etc.Meta
rustc --version --verbose
: