mirage / functoria

A DSL to invoke otherworldly functors
ISC License
63 stars 21 forks source link

use different exit codes in argv parsing #179

Closed hannesm closed 4 years ago

hannesm commented 4 years ago

Functoria_runtime.with_argv is parsing the argument vector. This is used by all MirageOS unikernels. It has a three-valued return: success (returns unit), `Help | `Version - used to be 0), Error (exception, OCaml runtime returns 2).

As it looks now when we start hello with an argument it cannot handle (--fpoo=bar):

console data 2019-10-11T23:18:42-00:00: hello: unknown option `--fpoo'.
console data 2019-10-11T23:18:42-00:00: Usage: hello [OPTION]... 
console data 2019-10-11T23:18:42-00:00: Try `hello --help' for more information.
console data 2019-10-11T23:18:42-00:00: Fatal error: exception Failure("Key initialization failed")
console data 2019-10-11T23:18:42-00:00: Raised at file "stdlib.ml", line 29, characters 17-33
console data 2019-10-11T23:18:42-00:00: Called from file "main.ml", line 29, characters 9-95
console data 2019-10-11T23:18:42-00:00: Called from file "camlinternalLazy.ml", line 31, characters 17-27
console data 2019-10-11T23:18:42-00:00: Re-raised at file "camlinternalLazy.ml", line 36, characters 4-11
console data 2019-10-11T23:18:42-00:00: Solo5: solo5_exit(2) called

Now, I intended to implement a restart-on-failure feature in albatross. The only information we have at exit is the exit code, and this can be divided into "those which should trigger a restart" (i.e. out of memory in most unikernels, unhandled exception, ..) and "those which should not trigger a restart" (i.e. solo5 tender can't process spt/hvt image, manifest mismatch, boot argument parser error).

FWIW, I figured the following exit codes are used:

Now, while restart-on-failure can be configured by a user with a whitelist of exit code which should trigger a restart, there is need for a sensible default, which is everything besides 1 and the (arbitrary) range 64..70.

(this is a PR against 2.2 which I tested locally, if this is accepted I'll of course PR as well against master)

hannesm commented 4 years ago

what I forgot to include in the comment above is the output with these changes in:

with --foop=bar

console data 2019-10-13T10:40:42-00:00: hello: unknown option `--foop'.
console data 2019-10-13T10:40:42-00:00: Usage: hello [OPTION]... 
console data 2019-10-13T10:40:42-00:00: Try `hello --help' for more information.
console data 2019-10-13T10:40:42-00:00: Solo5: solo5_exit(64) called

with --help

console data 2019-10-13T10:41:56-00:00: NAME
console data 2019-10-13T10:41:56-00:00:        hello
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: SYNOPSIS
console data 2019-10-13T10:41:56-00:00:        hello [OPTION]... 
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: UNIKERNEL PARAMETERS
console data 2019-10-13T10:41:56-00:00:        -l LEVEL, --logs=LEVEL (absent MIRAGE_LOGS env)
console data 2019-10-13T10:41:56-00:00:            Be more or less verbose. LEVEL must be of the form
console data 2019-10-13T10:41:56-00:00:            *:info,foo:debug means that that the log threshold is set to info
console data 2019-10-13T10:41:56-00:00:            for every log sources but the foo which is set to debug. 
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: OPTIONS
console data 2019-10-13T10:41:56-00:00:        --help[=FMT] (default=auto)
console data 2019-10-13T10:41:56-00:00:            Show this help in format FMT. The value FMT must be one of `auto',
console data 2019-10-13T10:41:56-00:00:            `pager', `groff' or `plain'. With `auto', the format is `pager` or
console data 2019-10-13T10:41:56-00:00:            `plain' whenever the TERM env var is `dumb' or undefined.
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: ENVIRONMENT
console data 2019-10-13T10:41:56-00:00:        These environment variables affect the execution of hello:
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00:        MIRAGE_LOGS
console data 2019-10-13T10:41:56-00:00:            See option --logs.
console data 2019-10-13T10:41:56-00:00: 
console data 2019-10-13T10:41:56-00:00: Solo5: solo5_exit(65) called

anyone (@mirage/core) has an opinion on this? I'd like to merge and release this monday (oct 14th) evening.

avsm commented 4 years ago

I'm in favour of this. Although it's a bigger change, it may be worth us standardising on sysexits.h for some of the returns, and not overlapping with them: https://www.freebsd.org/cgi/man.cgi?query=sysexits&apropos=0&sektion=0&manpath=FreeBSD+4.3-RELEASE&format=html

hannesm commented 4 years ago

NB: yes, I'm aware of sysexits. that's why I chose 64 / EX_USAGE for argument parsing error I re-read sysexits man page, and would appreciate any insight which other error codes a MirageOS unikernel could exit with? EX_DATAERR, EX_UNAVAILABLE, EX_SOFTWARE (which is rather vague, and in our case likely 2 -- an unhandled exception), EX_IOERR (once someone does block device storage seriously), EX_PROTOCOL (though that's pretty vague as well), EX_CONFIG (which is pretty close to EX_DATAERR).

the help / version is debatable (it seems as well valid to have them return 0 -- i.e. successful program termination -- but imho it is more useful to exit with a specific code (out of my own use case -- restart-on-failure, and restarting a --help unikernel is not that satisfying ;), happy to use 63 for help&version, which won't collide with sysexits).

~> maybe we should have a small package "exit-codes" that define a mapping between constructor and numeric value? this could then serve as a global registry, to-be-used in the various libraries that should deal with errors?

hannesm commented 4 years ago

I found http://www.tldp.org/LDP/abs/html/exitcodes.html which covers some more exit codes (all non-overlapping with either sysexits nor the ones proposed here), mainly for shell programming.

mato commented 4 years ago

FWIW, I figured the following exit codes are used:

solo5 returns whatever exit code the guest passed to void exit(int status).

  • 0 normal termination
  • 1 solo5 internal failure (i.e. bad image / mismatch between manifest and command line arguments)
  • 2 unhandled OCaml exception (out of memory, ...) -- to be changed to abort() in 4.10.0, see ocaml/ocaml#8630
  • 255 solo5_abort

If we're going to interpret exit codes, I should go through the Solo5 code and at least distinguish between:

  1. Start-up failure, e.g. target mismatch, bad command line, i.e. anything where the guest did not even start.
  2. Run-time failure, e.g. unhandled hypercall, VCPU exception for hvt, seccomp failure for spt i.e. cases where the guest did start but then failed while running.

Both of these cases currently return 1 for hvt. Case (2) returns 1 for hvt and 159 (returned by the shell due to process exiting with SIGSYS) for spt. There's nothing I can do about spt for (2), but at least for hvt, what codes would you like the tender to return? I'm guessing something high rather than low.

(related: https://github.com/Solo5/solo5/issues/424)

hannesm commented 4 years ago

@mato thanks for your explanation. it would be nice to document (and maybe adjust/unify) exit codes, but this is really not urgent.

hannesm commented 4 years ago

the list from sysexits and bash tutorial is atm as a comment at https://github.com/hannesm/albatross/blob/solo5-6/src/vmm_core.ml#L333-L358, @mato yes, something high would be good!

CI is green (after using dune.1.5.3 as test-dependency), merging and releasing