rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.36k stars 12.72k forks source link

Tracking Issue for io_error_more #86442

Open ijackson opened 3 years ago

ijackson commented 3 years ago

Feature gate: #![feature(io_error_more)]

This is a tracking issue for many io::ErrorKind variants being added in #79965.

The additions were motivated by a perusal of Unix standards, but corresponding mappings for Windows were included in that PR.

Public API

std::io::ErrorKind::Foo for many Foo. I will paste a list here when the MR is merged :-).

Steps / History

Unresolved Questions

davidv1992 commented 2 years ago

I would like to propose that we stabilize these. Based on the feedback above, these are very useful to have in stable, and there seem to be no real risks. Concretely, this would mean marking the following as stable:

The issues around io::ErrorKind::Other and the unexpected failures they caused have been resolved with the introduction of io::ErrorKind::Uncategorized. This also eliminates the potential breakage from exhaustive matching. As such, stabilization itself should not be a breaking change anymore.

As to the concern raised by @m-ou-se, I have gone over all of these errors, and almost all of them seem to have compatible OS error conditions on both windows and any posix compliant system. As such, platform specific issues should only occur on unusual platforms outside of either the current tier 1 or tier 2 lists. I think we should be OK treating any cases where these errors are not generated currently as bugs. A further search of the current open issue list suggests no such issues are currently known.

There are two exceptions: First, io::ErrorKind::FilesystemQuotaExceeded and io::ErrorKind::FileTooLarge do not seem to correspond to posix standardised errors. I think for these in particular it is acceptable if they fall back to io::ErrorKind::StorageFull, which seems to be what the posix standard suggest would happen under those conditions. Second, io::ErrorKind::ExecutableFileBusy and io::ErrorKind::Deadlock seem to be best effort, and have limited system support, but this seems to me well enough documented.

tsuyoshi2 commented 2 years ago

io::ErrorKind::FileTooLarge corresponds to EFBIG, which is in POSIX, see here. This error is distinct from io::ErrorKind::StorageFull or ENOSPC.

ClementNerma commented 2 years ago

As this problem is still around, I wonder if it would be a good idea to think about how to handle such changes.

Concretely, this is NOT a breaking change but a functional change, which is as important but not covered the same by Rust's guarantees.

The problems I see here are that:

  1. It isn't really clear what Rust offers as guarantees for breaking functional changes such as this one (it was documented in the 1.55's release notes), as they can still break stable code if you don't pay attention to the changelog. And we aren't talking about using an unsafe or nightly feature, but one that has been available for a long time.
  2. This also means that potentially any item of std and core could be affected by this problem. This makes writing reliable and future-proof code a lot harder and require to read every single changelog when upgrading to a newer Rust compiler.
  3. Even after this breaking change, there doesn't seem to be a reliable and future-proof way to correctly handle these IO checks, which is worrysome. Especially given more IO variants could happen to be released in the future.

I don't know what we could about these, but I think clarifying at least Rust's guarantees for functional changes could help (maybe it's already specified somewhere but I didn't find it). And officially endorsing that such breaking changes will ALWAYS be highlighted in Rust's release notes (like it has been for the 1.55's blog post).

rbtcollins commented 2 years ago

Sorry, I don't follow how this wasn't a breaking change. It was acknowledged as such much earlier on in the discussion. Code that did compile stopped compiling.

ClementNerma commented 2 years ago

Sorry, I don't follow how this wasn't a breaking change. It was acknowledged as such much earlier on in the discussion. Code that did compile stopped compiling.

Sorry it wasn't very clear in my message, I mean that it didn't change nor remove any entity (variable, function, trait, module) or their signature.

This is a functional change in the sense that the ErrorKind variants returned by the IO functions aren't the same as before, no variant was removed or renamed, it's just that these functions don't return the same values anymore.

joshtriplett commented 2 years ago

I think, at this point, we would provide the best experience by stabilizing these, so that we don't return errors people can't match.

@rfcbot merge

rfcbot commented 2 years ago

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

ijackson commented 2 years ago

I think, at this point, we would provide the best experience by stabilizing these

I very much agree that we should stabilise these.

Having said that, I can understand why it has taken a while.

The conversations in this area have been very long and detailed. The change from Other to Unrecognized, which has already been done, and is not part of what we're proposing to stabilise, has (perhaps inevitablh) become entangled in the conversations (especially since it was prompted by the proposal to add new errorkinds).

Also, this is an area that many people interact with on a daily basis but which also has quite surprising depths in a number of diifferent directions. So people tend to come with questions. I think we have answered all the questions satisfactorily - but it is not always to find the answers in these long threads.

If necessary (eg, if we don't have consensus for the FCP here) maybe I could write up a stabilisation report or something maybe.

joshtriplett commented 2 years ago

I'm expecting Unrecognized to remain perma-unstable, by way of avoiding the problem that occurred with Other.

And yes, I think there have been a lot of questions, but I do think they've been answered. Anyone is free to call out questions that haven't been, though.

cuviper commented 2 years ago

Shouldn't this be a libs-api FCP?

ijackson commented 2 years ago

Shouldn't this be a libs-api FCP?

Probably, yes, as I understand it.

Anyone is free to call out questions that haven't been, though.

Absolutely.

ijackson commented 2 years ago

I'm expecting Unrecognized to remain perma-unstable, by way of avoiding the problem that occurred with Other.

Yes.

yaahc commented 2 years ago

Going to cancel the fcp in progress because it was started under the wrong team. I haven't refreshed myself on the current state of the thread yet so I don't want to start the FCP since it will permanently check my box, @joshtriplett please restart the FCP under libs-api whenever you have a chance.

@rfcbot cancel

rfcbot commented 2 years ago

@yaahc proposal cancelled.

joshtriplett commented 2 years ago

@rfcbot merge

rfcbot commented 2 years ago

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rbtcollins commented 2 years ago

Sorry, I don't follow how this wasn't a breaking change. It was acknowledged as such much earlier on in the discussion. Code that did compile stopped compiling.

Sorry it wasn't very clear in my message, I mean that it didn't change nor remove any entity (variable, function, trait, module) or their signature.

The breaking change was the change in possible match values ErrorKind from exhaustive, to non-exhaustive. Thats part of the signature is it not?

This is a functional change in the sense that the ErrorKind variants returned by the IO functions aren't the same as before, no variant was removed or renamed, it's just that these functions don't return the same values anymore.

It is true that Other wasn't removed, but even with no change to return values, this would be breaking, because non-exhaustive is an attribute of the enum type.

I'm pulling on this thread because I think its important that this be classified as a breaking change, not as a 'functionally breaking change'.

A change in what variant is returned under some circumstances is a change that is only detected at runtime. So yes, that would be a functional breaking change.

This was detected at compile time. => a breaking change.

dtolnay commented 2 years ago

@rbtcollins it sounds like you've said a couple times you think std::io::ErrorKind got changed from being exhaustively matchable to not being exhaustively matchable. That is not the case. ErrorKind has never been exhaustively matchable in stable code in any existing release of Rust.

rbtcollins commented 2 years ago

@dtolnay ahha. I had misremembered how the change happened. Thank you for helping me get clear on it. Sorry @ClementNerma for the unneeded contention.

ijackson commented 2 years ago

I would like to clarify once again that the Other->Unrecognized change, which did in practice cause things to go wrong for a significant number of people, regardless of what kind of breakness we decree it to have had, has already happened and is not part of this FCP.

What we are proposing to stabilise now is just the ability to name the additional ErrorKinds (IsADirectory etc.) which are already being generated, currently-stably-unmatchably, So if this FCP passes it will not make any breakage or trouble worse in any way. Indeed, ability to match these variants is likely to be practically useful to (belatedly) properly fix some of the downstream test suites that suffered regressions. I appreciate that the introduction of Unrecognized was confusing and controversial and awkward, but it is not to do with this FCP (nor indeed to do with this MR - it was done in some other MR).

I think the primary questions that ought to be asked apropos this FCP are:

I think the answers to these questions are "yes". We spent the bulk of the discussions in this MR about the details of individual errors, with a smattering of more general discussions about principles. The MR had a lot of attention from Unix and Windows experts and there has been a year for folks from other systems to chime in.

But, of course, we might have missed something, which is what the FCP is for.

ijackson commented 2 years ago

What we are proposing to stabilise now is precisely the ErrorKinds listed in this table.

I have included the Unix and Windows errno values that map to these kinds. We are stabilising these mappings, but if we have missed some errors that ought to be categorised into these kinds and are currently Uncategorized, I think we could add those mappings later and it would be at worst a minor functional breakage.

ErrorKind Unix Windows
ArgumentListTooLong E2BIG
CrossesDevices EXDEV ERROR_NOT_SAME_DEVICE
Deadlock EDEADLK ERROR_POSSIBLE_DEADLOCK
DirectoryNotEmpty ENOTEMPTY ERROR_DIR_NOT_EMPTY
ExecutableFileBusy ETXTBSY
FileTooLarge EFBIG ERROR_FILE_TOO_LARGE
FilesystemLoop ELOOP
FilesystemQuotaExceeded EDQUOT ERROR_DISK_QUOTA_EXCEEDED
HostUnreachable EHOSTUNREACH ERROR_HOST_UNREACHABLE WSAEHOSTUNREACH
InvalidFilename ENAMETOOLONG ERROR_FILENAME_EXCED_RANGE ERROR_INVALID_NAME
IsADirectory EISDIR ERROR_DIRECTORY_NOT_SUPPORTED
NetworkDown ENETDOWN WSAENETDOWN
NetworkUnreachable ENETUNREACH ERROR_NETWORK_UNREACHABLE WSAENETUNREACH
NotADirectory ENOTDIR ERROR_DIRECTORY
NotSeekable ESPIPE ERROR_SEEK_ON_DEVICE
ReadOnlyFilesystem EROFS ERROR_WRITE_PROTECT
ResourceBusy EBUSY ERROR_BUSY
StaleNetworkFileHandle ESTALE
StorageFull ENOSPC ERROR_DISK_FULL ERROR_HANDLE_DISK_FULL
TooManyLinks EMLINK ERROR_TOO_MANY_LINKS
rbtcollins commented 1 year ago

@ijackson for ELOOP, Windows appears to give winapi::shared::winerror::ERROR_CANT_RESOLVE_FILENAME in similar situations (e.g. symlink loops). Could we add that in, or perhaps generalise FileSystemLoop to the slightly more general case of being unable to resolve, before stabilisation?

dtolnay commented 1 year ago

I'm marking @yaahc's box because she has stepped down from T-libs-api after the point that this feature got proposed for FCP.

rfcbot commented 1 year ago

:bell: This is now entering its final comment period, as per the review above. :bell:

ijackson commented 1 year ago

@rbtcollins What other things might ERROR_CANT_RESOLVE_FILENAME mean? If it might mean other kinds of things besides loops then I think Windows fails to reliably distinguish filesystem loops from other kinds of errors. Which might mean that portable Rust programs aren't allowed to expect the operating system to reliably distinguish loops from other problems :-/.

@rfcbot concern Windows ERROR_CANT_RESOLVE_FILENAME, FilesystemLoop

If we can't resolve this question immediately, I will remove FilesystemLoop from the to-be-stablised list and we will have to deal with that one later. I don't want to block all the rest.

ijackson commented 1 year ago

(Maybe someone with some rfcbot permissions could note the concern. This MR shouldn't merge without this being resolved, if only by dropping the stabilisation of that one kind.)

ijackson commented 1 year ago

Oh wait this isn't a stabilisation MR - it's just an issue, so we can just make the MR not include FilesystemLoop. But it would be good to be clear whether we are doing that.

ChrisDenton commented 1 year ago

The win32 error ERROR_CANT_RESOLVE_FILENAME currently maps only from the kernel error STATUS_REPARSE_POINT_NOT_RESOLVED.

The kernel error has the English error message as "The symbolic link could not be resolved even though the initial file name is valid". I'm pretty sure this situation can only occur when a loop is hit. EDIT: Or a long sequence of links that's not a loop but its length is greater than whatever the maximum is.

However, the win32 error message is more generic "The name of the file cannot be resolved by the system" so I guess the argument can be made that it does allow for more errors to be mapped to it in the future, should the need arise. I'm not sure that that would ever happen but I'll admit it is at least a possibility.

fogti commented 1 year ago

some system calls on Linux also use ELOOP to mean "ELOOP A loop exists in symbolic links encountered during resolution of the path argument, or O_NOFOLLOW was specified and the path argument names a symbolic link." so I think interpreting it as "symlink loop or similar symlink resolve error was encountered" might be an accurate description, although (bike-shedding!) I don't know if FilesystemLoop is an accurate name then, and not something like SymlinkResolutionFailed or such...

kalcutter commented 1 year ago

I think NotSameDevice is a better name than CrossesDevices. For me, it expresses its meaning more clearly. FWIW, the Linux man page also uses the word "same" to describe EXDEV: oldpath and newpath are not on the same mounted filesystem.

Also, I find NotSameDevice to be stylistically more consistent with the other existing and proposed error identifiers.

kalcutter commented 1 year ago

Can FilesystemQuotaExceeded be renamed QuotaExceeded? AFAIK Linux has no other quota error and EDQUOT is also used generically (not only for filesystem quotas). A few other uses are visible here: https://elixir.bootlin.com/linux/v4.5/ident/EDQUOT. The comment in errno.h simply describes it as "Quota exceeded".

rbtcollins commented 1 year ago

I think there's a good correspondence @ijackson - as @ChrisDenton says the current meaning of ERROR_CANT_RESOLVE_FILENAME maps well to the loop meaning of ELOOP, and as @zseri has said, ELOOP itself isn't returned solely when loops are detected. Add to that list mount(2) returning ELOOP for move operations where the target is a child of the source - something that has absolutely nothing to do with symlinks, and execve returning ELOOP for exceeding recursion limits during recursive script execution (since Linux 3.8).

I suggest renaming it to LoopError, but document that it means ELOOP on Linux and ERROR_CANT_RESOLVE_FILENAME on Windows, and either describe what we know right now, or provide breadcrumbs for readers to catch up.

rfcbot commented 1 year ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

cktkw commented 1 year ago

I might be a little too late, but I'm not sure combining ERROR_FILENAME_EXCED_RANGE and ERROR_INVALID_NAME into InvalidFilename is a good idea. What was the reason behind this decision?

For example if one is writing an unarchiver to extract files, if a filename is too long one could simply truncate the filename and tell the user this. But if the filename is invalid/reserved (eg. COM) then the name would have to be completely different.

An app could guess which of the two it is, but it would be more straightforward if the two are distinguished from the start.

ChrisDenton commented 1 year ago

It is a bit late but technically not too late; the stabilization PR has yet to be merged.

For example if one is writing an unarchiver to extract files, if a filename is too long one could simply truncate the filename and tell the user this. But if the filename is invalid/reserved (eg. COM) then the name would have to be completely different.

I could definitely see the two cases being handled differently in some situations.

An app could guess which of the two it is, but it would be more straightforward if the two are distinguished from the start.

Note that the original error is recoverable using raw_os_error(). Though that isn't particularly ergonomic and is of course platform specific.

cktkw commented 1 year ago

I was speculating there might be an OS that returns "Invalid Filename" when the filename is too long. In which case, this decision will make sense. But otherwise, "Filename Too Long" is one of the few errors that seem to be consistent among platforms.

With current PR, Unix doesn't have InvalidFilename mapped to anything else (at least from my quick skim through sys/unix/mod.rs). However, UNIX-like OS fails with varying error when invalid UTF-8 filename is used to open() in C. Mac APFS fails with EILSEQ (Illegal byte sequence). Linux EXT4 (formatted with strict case-insensitive option -O casefold -E encoding_flags=strict, because otherwise any byte sequence is allowed) fails with EINVAL(invalid argument).

In theory, I think these should be mapped to InvalidFilename in the context of File::open(), File::create(), etc. But I understand that that would need major rewrite. I don't know enough to say what would be the best way, but current use of InvalidFilename doesn't feel optimal.

PS. I've never contributed to rust dev. so I'm not sure of the procedures if this should get reviewed before standardization

ChrisDenton commented 1 year ago

PS. I've never contributed to rust dev. so I'm not sure of the procedures if this should get reviewed before standardization

I'll make a note on the stabilization PR that there's still some concern around the current mappings. Maybe the libs-api team will want to look at this again.

Currently io::Error stores OS errors internally as essentially OsError(i32) so there's no context available when getting the io::ErrorKind.

AlexTMjugador commented 1 year ago

I think that a new error variant for the EMFILE error code on Unix platforms could be useful for embarrassingly parallel applications. I would like to contribute a PR for such a new variant, would it be a welcome addition?

GoldsteinE commented 1 year ago

Is it blocked on t-libs-api? I feel like that’s an important feature since it’s really hard to avoid TOCTOU without it, e.g.

if let Err(err) = fs::remove_file(path) {
    if matches!(err, io::ErrorKind::IsADirectory) { ... }
}

becomes something racy like

if !path.is_dir() {
    fs::remove_file(path)?;
}

Maybe some less-controversial subset could be stabilized sooner? What’s blocking this issue from making progress?

lolbinarycat commented 5 months ago

one additional case that would be nice for this is "Too many open files" (EMFILE, 24 on linux)

i ran into this when writing async code, and there doesn't seem to be any way to test it besides checking the message.

GoldsteinE commented 5 months ago

@lolbinarycat On Unix-like you can do .raw_os_error() to get an error code + nix::errno::Errno for a portable list of errnos.

dr-kernel commented 4 months ago

I would like to propose that we stabilize these. Based on the feedback above, these are very useful to have in stable, and there seem to be no real risks. Concretely, this would mean marking the following as stable:

* `io::ErrorKind::HostUnreachable`

* `io::ErrorKind::NetworkUnreachable`

* `io::ErrorKind::NetworkDown`

* `io::ErrorKind::NotADirectory`

* `io::ErrorKind::IsADirectory`

* `io::ErrorKind::DirectoryNotEmpty`

* `io::ErrorKind::ReadOnlyFilesystem`

* `io::ErrorKind::FilesystemLoop`

* `io::ErrorKind::StaleNetworkFileHandle`

* `io::ErrorKind::StorageFull`

* `io::ErrorKind::NotSeekable`

* `io::ErrorKind::FilesystemQuotaExceeded`

* `io::ErrorKind::FileTooLarge`

* `io::ErrorKind::ResourceBusy`

* `io::ErrorKind::ExecutableFileBusy`

* `io::ErrorKind::Deadlock`

* `io::ErrorKind::CrossesDevices`

* `io::ErrorKind::TooManyLinks`

* `io::ErrorKind::InvalidFilename`

* `io::ErrorKind::ArgumentListTooLong`

The issues around io::ErrorKind::Other and the unexpected failures they caused have been resolved with the introduction of io::ErrorKind::Uncategorized. This also eliminates the potential breakage from exhaustive matching. As such, stabilization itself should not be a breaking change anymore.

As to the concern raised by @m-ou-se, I have gone over all of these errors, and almost all of them seem to have compatible OS error conditions on both windows and any posix compliant system. As such, platform specific issues should only occur on unusual platforms outside of either the current tier 1 or tier 2 lists. I think we should be OK treating any cases where these errors are not generated currently as bugs. A further search of the current open issue list suggests no such issues are currently known.

There are two exceptions: First, io::ErrorKind::FilesystemQuotaExceeded and io::ErrorKind::FileTooLarge do not seem to correspond to posix standardised errors. I think for these in particular it is acceptable if they fall back to io::ErrorKind::StorageFull, which seems to be what the posix standard suggest would happen under those conditions. Second, io::ErrorKind::ExecutableFileBusy and io::ErrorKind::Deadlock seem to be best effort, and have limited system support, but this seems to me well enough documented.

Something I noticed that https://doc.rust-lang.org/stable/src/std/sys/pal/unix/mod.rs.html#261 uses HostUnreachable as the output of kind() however unless you're nightly you can't actually match to this since that variant is feature guarded. This is unfortunate since that's the kind that's returned by TcpStream when a host is offline. Thus when I'm trying to match this error type to attempt a retry, I can't unless I search for the code instead, not as convenient. This list needs to make its way to stable ASAP if its already being used to report errors. Whats worse is that this was done 3 years ago and no one has since complained that they can't match this kind w/o nightly?!

GrigorenkoPV commented 2 months ago

Status update