rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.89k stars 1.56k forks source link

Include the errored file path (when available) in std::io::Error instances #2885

Open mqudsi opened 4 years ago

mqudsi commented 4 years ago

Given that probably the most common model for handling errors in rust programs is just to bubble them up, it is far too common to see errors like this:

mqudsi@ZBOOK /m/c/U/m/g/mytube> env RUSTC_WRAPPER=sccache cargo build
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `sccache rustc - --crate-name ___ --print=file-names -C target-cpu=native -C link-arg=-fuse-ld=lld --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 2)
--- stderr
error: No such file or directory (os error 2)

Note the last line: sccache panicked because it couldn't find some file. Which file? Who knows. You'll need to debug or grep the code to find out.

This is actually because std::io::Error is guilty of the same thing sccache did: it just bubbled up the OS error without adding any context.

This particular sccache error is just the error I was dealing with last before filing this issue, but I run into this about every other day dealing with 3rd party crates in the ecosystem. In my own code, all of my crates have to wrap std::io::Error in a new error type that includes the file path and at the site of invocation where std::io::Result is first returned, every std::io::Error gets mapped to a custom error that includes the path in question, if any.

I would like to propose that - at least in debug mode, although preferably always - any IO operations that involve a named path should include that path in their error text, and perhaps even the error type should be extended to include an Option<PathBuf> field (or select APIs returning a different error type).

I know this comes with some caveats. It's hard to use a reference to the path to generate the error text since that would tie the lifetime of the error to (likely) the scope it was called from, making bubbling it up harder, so it would necessarily have to copy the value. This increases the size of std::io::Error for cases where errors are not being bubbled up, but I've been mulling this in the back of my head for some time now, and I think the ergonomics outweigh the cost... and I'm also hopeful someone can figure some clever trick around that.

burdges commented 4 years ago

io::File consists of a unix file descriptor, which lack any filename information. io::File cannot be expanded because doing so violates rust's zero cost abstraction goal.

anyhow::Error allows adding context if you like.

Also, io::{Read,Write,Cursor} need io::Error, but should really be made usable no_std and even without alloc, which limits our options for adding anyhow::Error functionality.

Mark-Simulacrum commented 4 years ago

We cannot include this information in a zero cost way, so it would be a price that everyone would need to pay (even if they don't want to). As such, it would not fit well into std; a library providing a wrapper around std which adds relevant context could be viable though. Thanks for the suggestion, though!

If you want to discuss this further, I recommend going to internals.rust-lang.org.

mqudsi commented 4 years ago

Would it be inappropriate even in debug mode only?

Mark-Simulacrum commented 4 years ago

We would need an RFC at minimum to design such a change and I personally am not sure that it's possible given that we only ship one std (which is always compiled in release mode).

(Regardless, this is a sufficiently large request to need an RFC if you want to pursue it further).

burdges commented 4 years ago

Yes, I'd wager most code that benefits from this in debug would benefit in release too, making the debug effort wasted. In C/C++ you must attach such contextual information manually. anyhow::Error improves manual contexts with clean error propagation.

In Linux, you could read file paths from open using /proc/self and fs::read_link https://stackoverflow.com/a/1189582/667457 It'll returns an incorrect name if they file gets moved, but roughly this should work

fn file_to_path(file: &File) -> Result<PathBuf> {
    use std::os::unix::io::AsRawFd;
    let fd: i32 = file.as_raw_fd().into();
    use fmt::Write;
    let mut path = String:: with_capacity(24);
    path.write_fmt(format_args!("/proc/self/fd/{}", fd)) ?;
    ::std::fs::read_link(path)
}

In principle, you could write an anyhow::io crate that duplicates the std::io traits with such functionality. It's plausible one could define some new architecture linux-with-filename-error-context in https://github.com/rust-lang/rust/tree/master/src/libstd/sys but probably easier doing this through new traits.

Kixunil commented 3 years ago

I object to the idea that adding this is not zero cost. io::Error already contains an enum with several variants that needs to be copied around. File::open and other functions using Path already have to allocate CString for conversions. Adding this allocated string to io::Error is literally two MOV instructions around syscall. Syscalls are much more expensive than two MOV instructions, so if anyone wants to optimize anything, they should optimize number of syscalls, not number of move instructions associated with each syscall.

The benefit of adding Path to io::Error is enormous and should not be dismissed just because it requires two more MOV instructions.

The only issue I see is that paths will not be present in operations after opening the file. However in my experience errors after opening the file are extremely rare and they are pretty much guaranteed to be problems with whole filesystem (full, device unplugged...) in which case the specific file doesn't matter that much anyway.

I'm willing to write the RFC if my arguments convinced you that it's worth pursuing.

Mark-Simulacrum commented 3 years ago

I think I've personally been slowly more convinced over time that this is probably the right thing to do if we can pull it off on benchmarks and such (seems not implausible, given existing allocations etc) -- I think the right next step is to kick off some discussion on internals.rust-lang.org and get a read on how libs team feels about this (#t-libs on Zulip is probably best for that). See also https://github.com/rust-lang/rfcs/pull/2979, which tries to codify some of these processes :)

paulzzy commented 2 years ago

Hello! I'm here from r/rust. Has there been more recent discussion/progress on this issue? I'd love to have/help with this feature!

Kixunil commented 2 years ago

@paulzzy not that I'm aware of. I had it in the back of the mind but didn't find the time to write the RFC. If it starts moving, I will try to schedule some time to help. For now, here are my thoughts:

My desired error messages:

eddyb commented 2 years ago

There's a relevant PR from @m-ou-se (opened just over a year ago):

thomcc commented 2 years ago

At some point in the past I did https://github.com/thomcc/rust/tree/io-error-path which was the start of an updated version #86826. It was mostly to convince myself this was still possible with the new error repr (it is, but it does require an allocation now), but I ended up coming to the conclusion that it wasn't really worth doing, mostly because of what @m-ou-se said in that issue:

This only shows the path in the Debug implementation of io::Error. It's not clear what else we should do with it. Putting it in Display will make many programs show the path twice, and have less control over the output format. Adding a .path() accessor to io::Error is also not great, because it'll have to be a Option, and won't be accessible through the Error trait.

One thought is that the path could be provided even via the error trait via the provider API (CC @yaahc). Beyond that, I do think the fact that it will end up making many programs show the path twice is unfortunate. I'm not sure that this outweighs the utility of having the path available (certainly that would be nice, and it's tedious to have to truck path around just to be able to display it with the error), but yeah. That and the fact that I had convinced myself it was still possible is why I stopped working on that branch.

FWIW, it is also clear that it's not zero-cost anymore, at least not on 64 bit systems, as it requires an additional allocation (considerably more expensive than the two mov instructions cited above), but it is probably true that this is outweighted by the cost of a syscall. (It actually may require two allocations now, as since writing the code in that branch, we don't always allocate paths that are passed into the methods)

Kixunil commented 2 years ago

@thomcc interesting, I don't have enough time to look into it deeply but in my experience path being displayed twice is less bad than path not being displayed at all. There could be fn no_display_context(&mut self) that'd disable displaying, so if people want to handle it themselves they can. (Maybe there's even a better way.)

mqudsi commented 2 years ago

Displaying the path for {:?} and not {} is probably better than never displaying it, but it might not be enough. (Edit: especially with std::error::Error being so prevalent and implementing Display)

I agree that duplicating the path is problematic. Since first realizing that rust io errors are this useless on their own, I now always map IO errors at the site of creation to a custom error (or just a String in quick apps, i.e. not libraries) with format!("{}: {}", path.display(), e), and I know that I'm not the only one that does that.

Still, it's not the end of the world: if paths were included from day 0, no one would have formatted IO errors that way. If paths get included in Display today, there will be an ugly transition period but people will learn not to include them (easier if this behavior is gated on "edition 2023" or something, which should be legal).

However, there's also the matter of whether or not it's out of the question to use some internal magic to detect whether we are in a top-level fmt() call or in a recursive fmt() call. It should be possible to distinguish (probably in a zero-cost fashion) whether eprintln!("{}", e) (where e is an std::io::Error) was invoked or eprintln!("{}", e) (where e is a different std::error::Error wrapping the std::io::Error) that ends up internally formatting the underlying std::io::Error as part of its fmt() call.

It would be harder (and probably not zero-cost) to determine whether eprintln!("{}: {}", path.display(), e) (with e being std::io::Error) was called, but it's also certainly doable if there's a consensus that this is something that needs to be done one way or the other. Keep in mind that none of needs to be stable or exposed via any public APIs - the only real issue would be that <std::io::Error as std::fmt::Display> would behave differently in different contexts, which might end up being too bitter a pill to swallow.

thomcc commented 2 years ago

I'm going to reopen this since it's getting active discussion and seems plausible we could include it, and @Mark-Simulacrum (the person who closed it in the first place) indicated in an above comment that they've changed their mind somewhat.

I'll note that given the current io::Error implementation, this wouldn't be fully zero cost. That said it probably would be negligible compared to the cost of a system call, and the cost would be entirely in the error case -- the success path would be unpunished. It would be doable by conceptually storing (something similar to) ThinBox<(i32, Path)> as one of the variants, and expanding the pointer tagging to 3 bits (rather the current 2bit tags). This is effectively the approach used in the unfinished patchset linked earlier.

Doing so would incur a single extra allocation, but only in the case that the file operation failed. I'm unconvinced this matters or would show up in benchmarks, but perhaps it's not strictly zero cost.

CC @yaahc since this is related to errors.

Mark-Simulacrum commented 2 years ago

Doing so would incur a single extra allocation, but only in the case that the file operation failed. I'm unconvinced this matters or would show up in benchmarks, but perhaps it's not strictly zero cost.

IIRC, in many cases we're forced to allocate a CString for the path today anyway, which makes me wonder if we can upfront make a slightly larger allocation (by a few bytes) to avoid needing to copy/reallocate the path after the syscall fails. This would make this pretty much zero-cost, modulo a slightly larger path allocation, which in many cases is entirely unnoticeable as the allocator is probably putting us in a power-of-two bucket anyway.

yaahc commented 2 years ago

CC @yaahc since this is related to errors.

I have little to add to this one. I'm all for improving that situation because I'm aware that there are many unpleasant failure modes associated with error reporting for io::Errors where people omit crucial context that we don't include by default. Although I'd be happy to be surprised and have someone come up with a solution, I'm against anything that lowers the quality of existing error reporting, so I'm somewhat skeptic of anything that dynamically checks the context of the current stack to see if it's likely losing context or not, but I think improvements to the debug output seem promising.

thomcc commented 2 years ago

so I'm somewhat skeptic of anything that dynamically checks the context of the current stack to see if it's likely losing context or not...

Yes, I agree that's not a good solution either. I think that and the other "magic"ey solutions are probably a non-starter -- in all likelihood the choices probably are:

  1. Accepting displaying the path twice for programs that already print the path.
  2. Including it not in Display, output, only in Debug output, and perhaps via err.request_ref::<Path>() or similar.
  3. Not doing this, and closing the issue. This is probably reasonable, since this isn't complained about that often.

I'm in favor of the first or the third. I think the second is likely not worth the trouble, or the (admittedly small) overhead, since it would be very hidden. I'm most in favor of the first, as I don't think we consider the output of the io::Error's display implementation to be a stable detail, and don't think that including the path twice is that bad (it's a little goofy, but it seems decidedly better than not including it ever -- error messaging is often slightly redundant anyway). I could be mistaken about this, though.

mqudsi commented 2 years ago

I think option 1 is perfectly fair and would be my preference as well (despite having voiced aloud the other ideas as well), and like I mentioned, people will adapt and adjust their error reporting to account for the duplicated path in time.

paulzzy commented 2 years ago

I'd also prefer option 1. As a Rust beginner (admittedly not the most important demographic), it can be frustrating trying to figure out where the error is propagating from. I think having it built-in is a small price to pay for redundant printing. It'd certainly be helpful for new Rustaceans who don't know how to write custom error wrappers (this is a self-own).

For option 3, I'm not sure lack-of-complaints implies lack-of-trouble. Beginners probably don't know where to complain, while experienced Rustaceans already know how to deal with it. That said, this is speculation so I could be totally wrong.

Also, how much work is already done by https://github.com/rust-lang/rust/pull/86826?

thomcc commented 2 years ago

A significant amount of it would have to be redone, because the underlying error impl has been rewritten, and the approach must change. That said, I did part of that work in https://github.com/thomcc/rust/tree/io-error-path, but stopped because of being unsure of its utility.

joshtriplett commented 1 year ago

@thomcc wrote:

That said it probably would be negligible compared to the cost of a system call, and the cost would be entirely in the error case

The error case can still be on the common critical path: some programs try opening many files, most of which won't exist. For instance, consider the case of looking for files along a search path; most open operations will fail. Linux has a "negative dentry" cache specifically to remember "this file doesn't exist", to optimize these kinds of cases.

@Mark-Simulacrum wrote:

IIRC, in many cases we're forced to allocate a CString for the path today anyway

https://github.com/rust-lang/rust/pull/93668 fixes that.

Kixunil commented 1 year ago

@joshtriplett that changes the perspective quite a bit but we ran into an unfortunate situation. Without the path, error messages coming from io::Error suck for all non-open-performance-critical applications (vast majority?) and error handling sucks for those who want to fix it themselves. With the path we would damage performance of some applications.

Maybe the correct approach is really making a separate io::DetailedError type (not proposing this exact name) containing additional information and separate methods to wrap the errors.

stellarpower commented 1 year ago

I don't know any Rust - arrived at this page because I'm trying to modify some existing code. And precisely the same as the comments here - as an end user, "no such file or directory" is in effect useless to me, I can't help craft an MRE for the authors of the original code if I don't know what the cause of the error actually is, so it sounds like I will need to dig down deeper and return a different type with this on top. But I don't know Rust's design goals, I'm a C++ programmer really, just thought I'd add:

In C/C++ you must attach such contextual information manually.

Yes, that has traditionally been the case, we want to provide information using the what string. However, we probably would insert the paths in this string, and:

https://en.cppreference.com/w/cpp/filesystem/filesystem_error/path

now we have filesystem stuff in the std library, std::filesystem_error does in fact provide the paths that threw the exception, so it's not actually zero overhead on top of the unix error code from a C++ perspective. Again, know nothing about Rust or its object lifecycles, but if the paths have already been referenced in order to try to e.g. open a file, presumably theoretically it could be as straightforward as the size of a pointer to keep that hanging around in the exception itself.

blueforesticarus commented 1 year ago

Is there a best practice for dealing with this issue in the meantime? Perhaps a crate which wraps std io calls with something that preserves path in the Error?

It's easy enough if you use anyhow everywhere, but that has downsides.

Kixunil commented 1 year ago

@blueforesticarus I was working on such crate (with a few more bells and whistles) but didn't finish it. Would be happy to setup some repo for other people to help me if there's enough interest.

dan-da commented 1 year ago

Not doing this, and closing the issue. This is probably reasonable, since this isn't complained about that often.

In that case I will complain. It has always bothered me that file errors do not include the path. It is not just a matter of the display or debug message, but the path is not even available to query pro grammatically from the error. This was so surprising to me that originally I figured I was just using it wrong somehow. Surely the std lib wouldn't make such an oversight. So tonight I've spent about 3 hours tracking all this down, and now it seems I will have to spend more time creating a wrapper Error just to include info that was already passed to the write() call.

blueforesticarus commented 1 year ago

There is certainly a high cost to not including the path. It makes proper, useful error reporting harder, and therefore also makes it less likely to be done. It's better if you don't have to "remember the boilerplate".

blueforesticarus commented 1 year ago

@Kixunil Yeah, if you make a repo drop the link here. I'd be interested in taking a look.

kraktus commented 1 year ago

Hello,

I’m wondering about the state of this issue, now that https://github.com/rust-lang/rust/pull/93668 has landed, since the argument ‘we already allocate a bunch of Cstring’ is not longer valid.

I’d ready to have a look at this issue if there’s a green light.

xosxos commented 1 year ago

I am in support of this. At the moment I am forced to add context to every single ? error handling that concerns I/O. It feels bad.

CryZe commented 1 year ago

I am in support of this. At the moment I am forced to add context to every single ? error handling that concerns I/O. It feels bad.

You really only need to add it to the top level one, i.e. a single one, which honestly you need to do even if io::Error contains the path, cause parsing a file can result in many errors that are unrelated to the IO operation itself, such as the encoding being corrupt.

c-git commented 1 year ago

This is definitely a stumbling point for beginners, I for one didn't expect the path to not be included.

I'm not sure how practical it would be but for the use cases where it is on the critical path of execution I'm wondering if it could be better served by an alternate call that "anticipates" an error case and separate it from "regular" calls that at least for beginners expect the error message to include the path.

With regard to requiring an allocator, is it possible to check if one is available and only include the path if one is? If this only happens in the error case I think that's a worthwhile cost because it will save a lot of debug time trying to figure out what is not available, especially if you are new and don't even really know where to start. Googling error: No such file or directory (os error 2) is not really an option, no one can help you with just that information.

And with regard to double printing the error messages I think the edition system is a fine time to make that kind of change as it provides a point where people would have to opt in. I personally feel the double printing is less bad than never printing. And this will get less bad with time as people expect it to include the path.

xosxos commented 1 year ago

I am in support of this. At the moment I am forced to add context to every single ? error handling that concerns I/O. It feels bad.

You really only need to add it to the top level one, i.e. a single one, which honestly you need to do even if io::Error contains the path, cause parsing a file can result in many errors that are unrelated to the IO operation itself, such as the encoding being corrupt.

Yes well, trying to figure it out is how I ended up here. Like probably most users, I don't use Rust full-time and suffer quite large mental overhead from wrapping errors, so it's easier to leave it for now. Also as the Error Handling Project Group has gone stale (?), nobody's there to hold my hand :( The point about encodings is good, however, I feel like typoed filenames are much more common than corrupted files. Weirdly, I often feel like error handling is simultaneously one of the best and worst parts about Rust.

c-git commented 1 year ago

Yes well, trying to figure it out is how I ended up here. Like probably most users, I don't use Rust full-time and suffer quite large mental overhead from wrapping errors, so it's easier to leave it for now. Also as the Error Handling Project Group has gone stale (?), nobody's there to hold my hand :( The point about encodings is good, however, I feel like typoed filenames are much more common than corrupted files. Weirdly, I often feel like error handling is simultaneously one of the best and worst parts about Rust.

I think you should take a look at the anyhow create, it might make your life easier. If you get stuck you can always get in touch on discord there are lots of people willing to help you on your journey. This issue imo is about long term ergonomics and practicality not short term solutions. But hopefully we can find a way to help ppl to join in the future.

xosxos commented 1 year ago

I think you should take a look at the anyhow create, it might make your life easier. If you get stuck you can always get in touch on discord there are lots of people willing to help you on your journey. This issue imo is about long term ergonomics and practicality not short term solutions. But hopefully we can find a way to help ppl to join in the future.

I don't mean to derail this discussion any further than this. I am building a library for bioinformatics that handles errors using thiserror. Then I'm building a CLI + egui application that uses color_eyre for error handling on the binary side. The binary effectively has to panic on all errors (and even though it's an unpopular opinon, if the path would be included in the std::io::Error, a pure .unwrap() would 100% suffice). However, instead I need to bubble the error with context using eyre, which leads to more boilerplate. An elegant top level implementation which would have the identical ? ergonomics, but with the path included, is not immediately obvious.

dpc commented 11 months ago

Isn't there any library handling it? Would it make sense to havie? It could mimick stc::io and add context on all higher-level IO operations: open / exec, etc. . That would be a big improvement already. One could also wrap reader/writers etc. to attach path to read/write errors as well.

I thought there was something like that already, and I'm trying to find it. :D

Kixunil commented 11 months ago

@dpc I started writing one a long time ago but didn't finish. I wanted (and still want) to restart it but life got in the way.

thomcc commented 11 months ago

https://crates.io/crates/fs-err is one. I think there are some more.