Open killercup opened 6 years ago
From the etherpad:
?
in main (https://github.com/rust-lang/rust/issues/43301)
failure
is looking good but does not yet have clear messaging/documentation about how to architect/structure error handling in your appGiving a main
template. I didn't find anything interesting when I decided to use the failure crate and thus reinvented a basic main with error handling:
fn main() {
let opt = Opt::from_args();
env_logger::init();
if let Err(err) = run(opt) {
for cause in err.causes() {
eprintln!("{}", cause);
}
std::process::exit(1);
}
}
fn run(opt: Opt) -> Result<()> {
// ...
Ok(())
}
Does "error handling" include non-fatal errors i.e. warnings?
@Screwtapello this is about runtime errors, not about compiletime errors, I suppose.
Yes, runtime non-fatal errors, like "timed out connecting to host, retry 1 of 3" or "first line of input contains invalid values, assuming they're headers and skipping them".
But there is no such concept as a "warning" in Rust. We are talking about errors in the sense of std::error::Error
or Result::Err(_)
and how to handle them. A "warning", as you describe it, would be a way to handle certain errors. So either I'm misunderstanding something or the answer is simply: Yes, "warnings" are part of this discussion, as they are basically errors handled in a special way.
I think @Screwtapello talk about warning as "an Err, but that I don't want to propagate, but I don't want to silently discard". I have this use case too. For the moment, I use the log crate for that.
I'd say yes, non-fatal errors, like "log the failure and ignore/try again", are something we should consider. (It's also something I'd like to improve this quicli guide which does something like that.)
I use logging, but also have a utility to capture any warnings during the execution of a function/closure (I'm using using slog-scope
); the caller then gets to decide whether to be strict (if any warnings then print/abort), lenient (ignore the warnings) or somewhere in between ("There were some issues during parsing; do you want to continue anyway?").
failure
provides with_context
and context
. These methods extend rusts Result
type and make it really easy to structure excellent error messages that chain together nicely when propagated upwards with ?
. There is also the bail!
and throw!
macros provided by failure that let you propagate up either custom error messages using strings or custom error types.
I have built a decently sized cli app using failure
and I would be interested in documenting how to structure a project using failure
so that you can propagate errors up and fail fatally or capture them where they happen and act accordingly in a non-fatal way. Would that be done as a PR here or somewhere else? Or would that be considered part of the failure
documentation?
Edit: I see that some of this is already underway in the failure repo. Ill look there.
"Exit status should reflect success or failure" from https://github.com/rust-lang-nursery/cli-wg/issues/21 reflects on this too; I've never been confident using process::exit
in some deeply nested function, and always wanted some way to unwind and exit, perhaps via Result, but haven't come up with a good pattern yet.
@Mark-Simulacrum I have found that having most of your functions return Result<T, SomeErrorTrait>
combined with heavy use of ?
gets you a safe predictable way to percolate errors up to main and exit cleanly.
It'd be good to document a way to propagate up an exit code to the main function, though I suppose arguably the main function should itself decide what error code to return? I'm not sure what the typical pattern in C/C++ or other languages is as to what code should be responsible for determining which exit status to set.
I'm not sure what the typical pattern in C/C++ or other languages is as to what code should be responsible for determining which exit status to set.
IME there's very little convention around this in the C/C++ space. Plenty of existing C programs just call exit(status)
in arbitrary places (if not abort()
!). C's dismal error handling likely contributes to this. The ? in main
RFC has a section with some background on exit status code conventions. I think following the conventions proposed there is totally sensible, such that all you need to do is chain Result<T, Error>
up to main
.
That RFC also discusses a print_diagnostics_for_error
method, so the best bet here might be to ensure that returning something that implements failure::Fail
from main
prints a useful error message so that doing the right thing is as simple as:
use failure::Error;
fn work() -> Result<(), Error> { ...}
fn main() -> Result<(), Error> {
work()
}
IMHO, error handling in CLI apps shouldn't be all that much different than error handling in web apps. In web app frameworks, you typically insert a middleware-type error handler in your web request call stack. The error handler makes sure stack traces and such don't leak out when the app is running in production mode, but then can give various diagnostic information (django even puts you into a live debugger session) when exceptions occur.
CLI apps can also have exception handlers. For example, if you were to create a custom Error type in your application somewhere, you could give that error type an exit_code
field that's respected by the error handler for you application.
Another use of error handlers in CLI apps is to determine what to output, what form, and where, when an exception occurs. Let's say my app is designed for other programs to subprocess to, and I support a --message-format=json
flag. Instead of writing the more human-friendly error message to stderr
, the flag would trigger a json-structured error written to stdout
.
Yet another use of an error handler is a CLI application is to let your application upload details of unexpected runtime errors to some web endpoint for automatic error reporting (once your users opt-in, of course).
To me a CLI app is not intended to be long-lived. It is supposed to do its job and then exit. So just outputting an error message to stderr and returning a non-zero exit status is enough for very many apps. Mostly you can count on the OS to do cleanup. However, for bigger and more complicated apps that perhaps require more complicated cleanup, then yes it makes sense to propagate errors up to main.
Note that there are some valid reasons for using exit statuses apart from 0 and 1. An example might be differentiating soft and hard failures, which can be useful to recognise easily from a script, i.e. to check $? instead of parsing output.
Totally agree with https://github.com/rust-lang-nursery/cli-wg/issues/12#issuecomment-374811986 and few other comments pro-error propagating
Maybe we can document reasoning of why functions/methods other than fn main
or ones used as helper functions in fn main
shouldn't directly exit program so people naturally decide to do that:
Possible solutions:
Error/exit can be popped to the caller function as a Result
:
fn parse_args(rawArgs) -> Result<ParsedArgs, Error> {
if args.bad {
Error(…)
} else {
ParsedArgs(…)
}
}
Object-oriented inversion can be used like that:
fn parse_args(rawArgs, &mut parsedArgs, exiter) {
if rawArgs.bad {
exiter.exitWithError(code, message))
} else {
parsedArgs = …
}
}
Of course, I'd vote for functional-style with Result
as it keeps most of functions pure. But in some cases like logging a warning from deeply nested function can be easier with object-oriented style.
@uazu
So just outputting an error message to stderr and returning a non-zero exit status is enough for very many apps. … However, for bigger and more complicated apps that perhaps require more complicated cleanup, then yes it makes sense to propagate errors up to main.
Hope unit testing is a good reason to switch to error propagation even in small CLI apps :)
Hope unit testing is a good reason to switch to error propagation even in small CLI apps :)
No, it isn't a good reason. Really it would be best to test CLI apps as they are intended to be run, i.e. as separate processes. So make unit tests do fork/exec. There is a lot you'd have to fake up to run main() in the same process and not have the app-under-test notice, and to survive everything that might go wrong and not crash your test-runner.
However there are other reasons such as making valgrind output useful, which I could agree with. Sometimes, though, tearing down all your memory structures and closing everything is just completely unnecessary and a waste of processing time, and you just want to exit().
Edit: I hope I didn't put that too strongly. Yes, tearing down everything and returning to main does have its uses, but exit() does too. I've seen people massively overengineer stuff that should be ever so simple to code. So allow simple things to be simple. However if you're writing a huge app, then structure and rules are definitely going to be your friends rather than your enemy.
Why not both 😸
Really it would be best to test CLI apps as they are intended to be run, i.e. as separate processes.
Well, you can say that for all the code, even a library is intended to eventually be run as part of some process :) Doesn't mean it shouldn't have unit-tests.
Splitting code into small pure/-ish functions and unit-testing them greatly helps in development and maintaining quality.
Moreover, you can treat 99% of the code of a CLI app as a library(s) and fn main()
as a user of that library(s).
However I totally agree that CLI apps should ideally have Integration/Functional tests as well 👍. As you said it is the original purpose of them — to be run as separate process.
Here is an example of CLI Rust project with both Unit and Integration/Functional tests: Mainframer
Okay, got it. Yes, anything that's coded like a library needs to follow library-like rules (like passing back errors, and having independent tests).
The scenario I'm working with is small (2000-line) CLI-style apps that run on the end of pipes that we use to isolate our main server process from hardware driver issues. On failure, they should report to their output and wait to be killed. If they are unable to report the error or anything else goes wrong they must die immediately, which will be recognised by the parent process. Doing a full teardown is risky. A close may block indefinitely. We want the OS to do the cleanup. In this case, whilst we may have some parts structured like libraries, other parts clearly aren't, and for very good reason will call exit() on failure.
This to me is closer to the UNIX concept of CLI apps as disposable components, e.g. you ^C a whole pipeline without ever thinking about how that is dealt with. Most will just exit immediately and the OS will do cleanup. This is instantaneous. An app that doesn't die immediately is the (irritating) exception to the rule.
@BurntSushi
CLI tools should exit gracefully when a "broken pipe" error occurs.
From https://github.com/rust-lang-nursery/cli-wg/issues/21
Is there anything we need to do here to help with that?
@epage It doesn't happen by default. If you treat it like any other error, then you're likely to exit with the wrong code. If you ignore it completely (which will happen if you're using println!
), then your program will happily march along even though the consumer of your process's output has hung up and is no longer listening.
I don't know how to fix this necessarily, but at the very least, it could be documented somewhere that it is a footgun.
I think something as simple as a library function that returns Read
/ Write
wrappers that treat broken pipe as EOF could work?
For output to a broken pipe on UNIX, you'd normally get a SIGPIPE signal. Or does something disable that? (See: man 7 pipe)
Yes. I believe std disables/masks it.
@BurntSushi Okay, I've found it: https://github.com/rust-lang/rust/blob/f5631d9ac7745dd6eaea2bc6c236d5f8e54e9a18/src/libstd/sys/unix/mod.rs#L71
Perhaps the CLI WG needs to make a decision about this, and other UNIX signals too. (I don't know how things work on Windows.) There are two cases as far as I can see:
CLI tools that don't need to clean up anything special. These can handle SIGINT, SIGTERM, SIGPIPE, etc by just letting the OS cleanup the process. So the OS will close files, pipes, streams etc and release all memory -- but unflushed data in software buffers won't get written. This is fine for very many CLI apps.
CLI tools that need to do a little bit of network communication before terminating, or that need to flush to files or databases, or do other cleanup. These need to catch and handle SIGINT, SIGTERM, etc, and recognise and handle errors (e.g. errno=EPIPE) everywhere they may occur
I can imagine that for consistency, you might want to recommend using the second option everywhere, for example for those people who want to see all errors propagate back to main
. This causes a bit more work. In that case handling signals and EPIPE needs to be made consistent (e.g. println!
mentioned above). Also you need to consider how to handle SIGINT (^C). The signal handler could set an internal flag that some loop could recognise, and then propagate an error back to main. However probably you'd need to make ^C^C do an exit()
to give the user a way out just in case nothing notices that the flag has been set. So really trying to catch the signals makes more work, but if you need it to do cleanup then it is necessary, and if you want control to always come back to main
it is also necessary.
Does anyone have any thoughts or ideas about this?
I had a quick look at errors getting silently dropped. Testing here on Linux, println!
doesn't drop the EPIPE, but panics instead, which is better than ignoring it. But there's at least one other case of silently dropping errors: FileDesc::drop()
. Perhaps there are other examples. (Would it be worth having a ticket to build up a list of these?)
So there may be various sources of errors which are outside the normal error propagation path:
kill
to our PID)FileDesc::drop
So these could be handled by a single mechanism:
This isn't very nice really, but what other mechanism do we have to deal with stuff like this?
(Also, you're quite limited in what you can call in a signal handler (see man signal-safety
on Linux). So maybe a global flag may be the only way for SIGINT and SIGTERM, rather than trying to lock an unsafe global array.)
I had something simpler in mind. Basically, either of these behaviors are undesirable:
The problem is that (2) happens by default if you use println!
, and (1) is an easy thing to slide into if you replace println!(...);
with let _ = writeln!(io::stdout(), ...);
. I think it is really unfortunate that println!
does the wrong thing here, which basically means that one almost never wants to use println!
in a CLI tool. You might be able to get away with it if you're only printing a small amount of output, but otherwise, it's a footgun that can appear any time you pipe the output of your tool into head
, for example.
There are two "simple" solutions to this:
SIGPIPE
, e.g., libc::signal(libc::SIGPIPE, libc::SIG_DFL)
. This will cause the process to terminate immediately upon receipt of a SIGPIPE
. No actual signal handler is needed.Either of these two solutions resolve the actual footgun here. Going beyond that to clean-up logic and all that is a totally separate issue. And yeah, I do think signal handlers deserve their own issue as well, because the state of those in Rust isn't great at the moment.
FWIW, I can't remember the motivation for changing the default signal handler for SIGPIPE
. @alexcrichton Do you know?
FWIW, I can't remember the motivation for changing the default signal handler for SIGPIPE. @alexcrichton Do you know?
This was initially done long long ago for a reason that may or may not still be relevant. I think that if you do a syscall and would otherwise get EPIPE
a program will raise SIGPIPE
first. For example if you write to a closed socket it'll by default kill the program. We weren't too happy about that (we'd rather socket writes returned an error on errors), so we changed the default signal handler.
Nowdays though for sockets at least we use things like MSG_NOSIGNAL
and other techinques to work even when SIGPIPE
is still raised by default (aka writing to a socket should always return an error). This may still affect things like writing to child pipes of processes, though.
This also mirrored what libuv did long long ago but it may no longer be true today
For the pipe issue, this program only prints one error because of SIGPIPE rather than two
The topic here is error-handling in general. Perhaps it needs splitting out into separate issues. I've added #27 and #28 for signal handling and silently dropped errors. (Please someone close these if they're inappropriate, but it seems necessary to split things.)
What options are there for fixing println!
to return errors, or is this completely impossible due to backwards compatibility? If all errors were caught and returned, then that seems much better than using a handler for SIGPIPE, because you get more information and it's local.
There seem to be various not-directly-reportable errors: SIGINT, SIGTERM, and FileDesc::drop() for a start. Maybe also println!
failures if it's not possible to change the return value or common usage of println!
and we want to avoid panics on EPIPE. A general mechanism to queue these errors internally within std
and then query them from user code would help with all of these cases. I'm bringing this idea back here because #28 doesn't seem the right place for it.
In the first meeting, we've discussed that we should have a best practice for error handling in CLI apps.