Closed adrelanos closed 9 months ago
I am pretty sure we need set -e
.
I think REPORT_TRAP_ERR='yes' and FAIL_TRAP_ERR='yes' should be enabled by default anyhow. Even without any options needed. That's a good style. More reliable, secure. What do you think?
Otherwise conditionally enabling set -e
when FAIL_TRAP_ERR='yes' is possible too but a less clean.
Could you advice please so I prepare a pull request?
for i in format_efi_partition prepare_vm mkfs tunefs \
mount_target mountpoint_to_blockdevice debootstrap_system \
preparechroot execute_pre_scripts chrootscript execute_post_scripts \
remove_configs umount_chroot grub_install umount_target fscktool ; do
if stage "${i}" ; then
if "$i" ; then
stage "${i}" 'done' && rm -f "${STAGES}/${i}"
else
bailout 2 "$i"
fi
fi
done
Pretty sure the if "$i" ; then
prevents errors being caught.
That helps.
+ mkfs.fat -F32 -n EFI /dev/mapper/loop0p1
/usr/sbin/grml-debootstrap: line 1358: mkfs.fat: command not found
++ error_handler
++ last_exit_code=127
++ last_bash_command='mkfs.fat -F32 -n "EFI" "$EFI_TARGET"'
++ '[' yes = yes ']'
++ echo 'Unexpected non-zero exit code 127 in /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap at line 1358 2181 0 detected!
last bash command: mkfs.fat -F32 -n "EFI" "$EFI_TARGET"'
Unexpected non-zero exit code 127 in /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap at line 1358 2181 0 detected!
last bash command: mkfs.fat -F32 -n "EFI" "$EFI_TARGET"
++ '[' '!' yes = yes ']'
++ command -v bailout
++ bailout 1
++ cleanup
Error was caught.
(Created https://github.com/grml/grml-debootstrap/issues/225 for the underlying error.)
I am pretty sure we need
set -e
.I think REPORT_TRAP_ERR='yes' and FAIL_TRAP_ERR='yes' should be enabled by default anyhow. Even without any options needed. That's a good style. More reliable, secure. What do you think?
I'm aware that there are two kind of folks: one that always prefer set -e
, and one that hates set -e
and avoid it.
Nowadays I personally tend to write scripts with set -e
and properly handle failure situations, explicitly.
But adding / enabling set -e
by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.
Otherwise conditionally enabling
set -e
when FAIL_TRAP_ERR='yes' is possible too but a less clean.Could you advice please so I prepare a pull request?
AFAICS this is what you implemented in https://github.com/grml/grml-debootstrap/pull/226
I'm aware that there are two kind of folks: one that always prefer
set -e
, and one that hatesset -e
and avoid it.
Interesting.
Nowadays I personally tend to write scripts with
set -e
and properly handle failure situations, explicitly.
Lucky we're on the same page here.
I also remorse that I haven't written all my scripts with pipefail and nounset from the beginning. nounset could be considered for this repository too at a later time, one step at a time, if progress is made here, which I am considering to contribute.
But adding / enabling
set -e
by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.
I think strict error handling and failing early is more important and doubt there will be many failures. And the only way to fix these issues that I can see is to let it fail and fix it.
The uses of eend $?
style error checking in grml-debootstrap are inconsistent, not everywhere. The inconstant error checking has then lead to follow-up issues which were caught but took me longer to debug.
For the Kicksecure, Whonix build script we've been using grml-debootstrap for raw VM image builds with FAIL_TRAP_ERR='yes'
for years. And will continue to do so with the even stricter error handling that is upcoming in https://github.com/grml/grml-debootstrap/pull/226. That's how I conclude that there cannot be that many issues. If nothing went wrong, Linux utilities don't terminate with non-zero exit codes.
However, my blind spot here is that I am not using grml-debootstrap for non-VM builds and not in interactive mode. Related topic:
So what I could suggest is that I attempt to rework the error handling, make sure that VMs build. Would it be feasible for you to test the "other" builds and fix these new issues? Any failing command would probably be easily fixed with appending || true
as a quick solution.
[...]
Nowadays I personally tend to write scripts with
set -e
and properly handle failure situations, explicitly.Lucky we're on the same page here.
I also remorse that I haven't written all my scripts with pipefail and nounset from the beginning. nounset could be considered for this repository too at a later time, one step at a time, if progress is made here, which I am considering to contribute.
ACK, I try to use set -e
, set -E
, set -u
+ set -o pipefail
in bash scripts.
But adding / enabling
set -e
by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.I think strict error handling and failing early is more important and doubt there will be many failures. And the only way to fix these issues that I can see is to let it fail and fix it.
Fine for me, as long it's not only myself having to do the heavy lifting :)
The uses of
eend $?
style error checking in grml-debootstrap are inconsistent, not everywhere. The inconstant error checking has then lead to follow-up issues which were caught but took me longer to debug.
ACK
For the Kicksecure, Whonix build script we've been using grml-debootstrap for raw VM image builds with
FAIL_TRAP_ERR='yes'
for years. And will continue to do so with the even stricter error handling that is upcoming in #226. That's how I conclude that there cannot be that many issues. If nothing went wrong, Linux utilities don't terminate with non-zero exit codes.
Ah, that's very interesting feedback, which certainly removes much of my reluctant feelings towards it and makes me feel positive towards enabling set -e
by default! :)
However, my blind spot here is that I am not using grml-debootstrap for non-VM builds and not in interactive mode.
I personally don't use the interactive mode neither that often, though care very much about non-VM builds. A test bed for those would be really useful to have.
Related topic:
* [automated CI testing #228](https://github.com/grml/grml-debootstrap/issues/228)
nod
So what I could suggest is that I attempt to rework the error handling, make sure that VMs build. Would it be feasible for you to test the "other" builds and fix these new issues? Any failing command would probably be easily fixed with appending
|| true
as a quick solution.
Yes, that sounds a like a good plan to me! Count me in :)
I am a bit unsure about the cleanup
function.
If is in two cases, either:
bailout
function.In case of A), one would probably want "strict error handling". I suppose, if some command unexpectedly exits non-zero, one would want function to abort and the error_handler
to be triggered. The error_handler will then call bailout
, which then itself calls cleanup
.
In case of B), I suppose one probably wants "relaxed error handling". That is, if some command exits non-zero, one would want to see that command failing, however the cleanup
function should not be interrupted and continue to completion.
Need some feedback on this one. Happy to do something simpler. It will might become more clear once I push the pull request.
As for execution of the stages... The following would be a bug...
"$i" || bailout 2 "$i"
For deminstration:
#!/bin/bash
set -x
set -e
trap errorhandler ERR
errorhandler() {
true ERROR
}
xxx() {
true hi
false
true This message we do not want to see...
}
true START
xxx || exit 1
true STOP
output:
+ set -e
+ trap errorhandler ERR
+ true START
+ xxx
+ true hi
+ false
+ true This message we do not want to see...
+ true STOP
If we want the error handler to be triggered and abort as quickly as possible, we need to avoid function || bailout
because that delays the part after the ||
.
Why this is happening is I need to investigate. Could be a localized issue. However, I can see REPORT_TRAP_ERR and FAIL_TRAP_ERR being
yes
. Yet, the error handler does not fire. The error is only caught later but not thorugh theerror_handler
.