Open rade opened 9 years ago
See #956 for a special case.
Why not just show what it died on instead of having to go to the docker logs. Anyway, I thought you wanted to get rid of those messages so I noted it.
Why not just show what it died on instead of having to go to the docker logs.
That's what we are trying to do here. It's not easy.
Perhaps just tail a few stderr lines from the logs.
Alternatively, make sure all common startup errors get logged with some easily recognisable pattern that we can grep for in the script, falling back on the existing generic error message if the grep fails.
If you were to assume the most common errors are cmd line parsing, we could do a dummy run of weave (add a --dummy) which exits 0 if it the command line parses and is sane, 1 otherwise. If we ran it -ti
then the output would be seen by the user.
If you were to assume the most common errors are cmd line parsing
It's more than that. The referenced #956 is an extreme case, since that is raised very late during startup, after the router is running.
Is #956 due to unresolvable hostnames? We could resolve them before returning with --dummy. Probably not that simple though.
On Mon, Aug 17, 2015 at 8:24 PM, Matthias Radestock < notifications@github.com> wrote:
If you were to assume the most common errors are cmd line parsing
It's more than that. The referenced #956 https://github.com/weaveworks/weave/issues/956 is an extreme case, since that is raised very late during startup, after the router is running.
— Reply to this email directly or view it on GitHub https://github.com/weaveworks/weave/issues/1280#issuecomment-131936108.
It would actually be quite trivial to log all the errors with a grep-able pattern. We'd simply need to replace all the Log.Fatal
invocations in main.go with a function that invokes the Log.Fatal
with a suitable prefix.
To extend the logging idea, weave status
could look for relevant log lines from the last run, as an improvement on saying "weave is not running".
It can't be that hard to find all the places where the router decides to quit, and the panic log is also recognisable.
To extend the logging idea
Separate issue; let's not pile extra features into this one.
The "grep the logs" idea is flawed since docker logging can be configured such that container logs go elsewhere an are not available with docker logs
. (which, after several users ran into that, prompted us to make the error message the more generic "Consult the container logs for further details.")
I suppose we could try docker logs
grep-ing, and if that fails revert to the generic error.
It would actually be quite trivial to log all the errors with a grep-able pattern. We'd simply need to replace all the
Log.Fatal
invocations in main.go with a function that invokes theLog.Fatal
with a suitable prefix.
The Log.Fatal
output is already quite grep-able...
$ weave launch --iface=foo
The weave container has died. Consult the container logs for further details.
$ docker logs weave |& grep "^FATA:"
FATA: 2016/04/12 22:29:23.399524 At most one of --datapath and --iface must be specified.
FATA: 2016/04/12 22:29:57.706913 At most one of --datapath and --iface must be specified.
(NB: there are two errors here because these days we recycle containers; so this is something we need to watch out for. We could just take the last line, no matter what)
The errors from the flag parser come out differently though. The logging there is configurable, but in strange ways that actually alter the behaviour.
How about writing the cause of death to a file, which we could docker cp
out of the container and then cat to stderr?
seems overkill. docker logs --tail=1 weave
will do the right thing in most cases.
Kubernetes has "termination reason": basically you write the thing we've been discussing to a file /dev/termination-log
and Kubernetes pulls it out.
The weave and weaveproxy container can die on startup, e.g. when invalid options are specified. Currently this shows up as a
The weave container has died. Consult the logs with 'docker logs weave' for further details.
type error. It would be rather more helpful to show the actual error.Perhaps just tail a few stderr lines from the logs.