Closed GoogleCodeExporter closed 8 years ago
I often do something like "pssh -h myhosts -t 5 echo hi" for this purpose. I
believe that this would meet the needs that you describe; is there anything
that it's missing? Let me know what you think.
Original comment by amcna...@gmail.com
on 20 Dec 2010 at 9:18
That doesn't really address my problem because if the machines are down, pssh
still exits with 0 so the caller can't determine if all the machines are up.
Normally it makes sense for pssh et al to exit(0) even if some commands fail,
but not always.
The more I think about it now, the more I think all the tools need an extra
option; something like "--exit-one-on-failure" that if passed will cause pssh
et al to exit(1) if any of the requests fail.
That would solve my immediate problem by allowing
"pssh -h hosts -t 10 --exit-one-on-failure exit 0" || doFailureCode()
Original comment by mdennis%...@gtempaccount.com
on 21 Dec 2010 at 7:26
Hmm. Shouldn't pssh always exit with an error if there's a single failure. I
had thought that this was already happening. The current behavior sounds like
a bug to me; can you think of any particular reason that it should exit(0) even
if some commands fail?
Original comment by amcna...@gmail.com
on 21 Dec 2010 at 8:34
That certainly isn't happening right now.
My argument for it returning 0 is to be able to distinguish between pssh having
a problem and the remote servers and/or ssh having a problem. This could also
be accomplished with using different return codes for each. For example 1 for
pssh failure (couldn't allocate memory, bad args, etc) and 2 for remote/ssh
failure (timeout, key rejected, connection refused, remote command exited with
non-zero return, etc). This is similar to how grep et al works. If grep
matches anything, it exits 0. If it doesn't match anything, it returns 1.
I'm not against making it exit(somethingNotZero) if a ssh command failed by
default, but I figured that was pretty explicit functionality to have in there
so assumed it was done on purpose.
Original comment by mdennis%...@gtempaccount.com
on 22 Dec 2010 at 12:47
I like the idea of having different error codes to discern between different
problems. Do you have any suggestions about what the error codes should mean?
One possibility would be to return the number of hosts that failed, perhaps
with a "-1" if it's some fatal early error (such as an invalid hosts file).
Any thoughts?
Original comment by amcna...@gmail.com
on 9 Jan 2011 at 6:57
negative returns can be somewhat of an issue on most systems, as can numbers
above 255.
As examples try:
python -c 'import sys; sys.exit(-1)'; echo $?
and
python -c 'import sys; sys.exit(256)'; echo $?
http://www.gnu.org/software/libc/manual/html_node/Exit-Status.html may be
helpful to you here.
Since reporting the numbers of failures above 255 isn't possible, I don't think
that's a workable solution since it would limit the use of pssh to less than
256 nodes which would be a real problem.
I would just do something simple like:
0: OK
1: pssh failure (couldn't execute a subprocess for one or more hosts for some
reason)
2: ssh and/or remote failure of one or more hosts (subprocess was executed but
returned non-zero)
Personally I don't think anything more is all that useful as in most cases
there is nothing an automated caller could do to fix it and a interactive
caller can read the output.
Original comment by mdennis%...@gtempaccount.com
on 9 Jan 2011 at 7:46
Doesn't a return code of -1 turn into 255? We could return the number of
failed hosts up to 250 or something, with -1 being a pssh failure.
Or more in line with your proposal, it might make sense to have a different
return code if all ssh commands fail than if only some of the ssh commands fail.
I suppose either of these would be better than what we're doing right now, but
at the moment I don't have a strong preference.
Original comment by amcna...@gmail.com
on 10 Jan 2011 at 2:24
Hmm. In addition to whether one or more processes failed, there is also the
issue of whether a process returned a non-0 exit status. I need to think about
this a bit more, but I think there are several different values of exit status
that we might want to provide. Here's what I'm thinking right now:
0: all commands successful and returned 0
1: at least one remote command returned a non-0 value (but all commands ran)
2: at least one ssh command returned 255 (connection error, bad password, etc.)
3: at least one ssh process timed out or killed by a signal
4: internal pssh error
Analogous exit statuses would be used for prsync, pscp, etc. (although some
might not exit with a value of 1). Any thoughts? Is there anything else
missing from this list? I'll send an email to the mailing list to solicit
additional input.
Original comment by amcna...@gmail.com
on 18 Jan 2011 at 11:59
The errors you mention are not necessarily mutually exclusive. Use a bitfield;
that is, assign powers of two to them and add them up.
Original comment by mark.d.k...@gmail.com
on 19 Jan 2011 at 3:13
Indeed they aren't mutually exclusive--my thought was to return the max (most
severe). The bitfield idea is clever, but I'm not sure if I've come across it
in this context. Is there any precedent for using bitfields for exit status
codes? I know that bash provides an arithmetic operator for bitwise AND, but
overall it seems like there isn't much shell-level support for this. What do
you think?
Original comment by amcna...@gmail.com
on 19 Jan 2011 at 5:39
I've looked into this, and so far I haven't been able to find any other
programs that use bit fields for exit status. Combined with the fact that the
"test" command doesn't have any bitwise operators, I'm edging towards the
scheme from comment #8, with the plan to make the semantics clear in the man
page.
Original comment by amcna...@gmail.com
on 19 Jan 2011 at 8:27
Meaningful exit status codes were added to pssh in commit 4ef1fea. The pssh
man page includes documentation on the subject. I still need to fix the other
commands. Please let me know if you see any problems or if you have any
last-minute feedback.
Original comment by amcna...@gmail.com
on 21 Jan 2011 at 10:30
Okay, this is done for the others as well (although we still need to add man
pages for these). I'm going to mark this as closed, but please reopen it if
you see any concrete or subjective problems with the implementation. Thanks.
Original comment by amcna...@gmail.com
on 21 Jan 2011 at 10:37
Works in bash:
> bash -c 'bash -c "exit 5"; xit=$?; if (( $xit & 1 )); then echo "1 bit set";
fi; if (( $xit & 2 )); then echo "2 bit set"; fi; if (( $xit & 4 )); then echo
"4 bit set"; fi;'
1 bit set
4 bit set
Works in tcsh:
> /bin/tcsh -c '
> /bin/tcsh -c "exit 5"
> set xit=$?
> if ( ( $xit & 1 ) != 0 ) then
> echo "1 bit set"
> endif
> if ( ( $xit & 2 ) != 0 ) then
> echo "2 bit set"
> endif
> if ( ( $xit & 4 ) != 0 ) then
> echo "4 bit set"
> endif
> '
1 bit set
4 bit set
Generating the errors themselves does not require bitwise operators, just
addition.
Original comment by mark.d.k...@gmail.com
on 24 Jan 2011 at 5:18
Original issue reported on code.google.com by
mdennis%...@gtempaccount.com
on 17 Dec 2010 at 6:43