grml / grml-debootstrap

wrapper around debootstrap
59 stars 27 forks source link

error not caught #224

Closed adrelanos closed 9 months ago

adrelanos commented 11 months ago
+ mount -t vfat /dev/mapper/loop0p1 /mnt/derivative-maker-grml-debootstrap.42265/boot/efi
mount: /mnt/derivative-maker-grml-debootstrap.42265/boot/efi: wrong fs type, bad option, bad superblock on /dev/mapper/loop0p1, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.
+ chroot /mnt/derivative-maker-grml-debootstrap.42265 dpkg --list shim-signed
+ chroot /mnt/derivative-maker-grml-debootstrap.42265 grub-install --target=x86_64-efi --efi-directory=/boot/efi --uefi-secure-boot --removable /dev/loop0
Installing for x86_64-efi platform.
grub-install: error: /boot/efi doesn't look like an EFI partition.
+ chroot /mnt/derivative-maker-grml-debootstrap.42265 grub-install --target=i386-pc /dev/loop0
Installing for i386-pc platform.
Installation finished. No error reported.

Why this is happening is I need to investigate. Could be a localized issue. However, I can see REPORT_TRAP_ERR and FAIL_TRAP_ERR being yes. Yet, the error handler does not fire. The error is only caught later but not thorugh the error_handler.

+ umount /mnt/derivative-maker-grml-debootstrap.42265/boot/efi
umount: /mnt/derivative-maker-grml-debootstrap.42265/boot/efi: not mounted.
+ bailout 2 grub_install
+ cleanup
adrelanos commented 11 months ago

I am pretty sure we need set -e.

I think REPORT_TRAP_ERR='yes' and FAIL_TRAP_ERR='yes' should be enabled by default anyhow. Even without any options needed. That's a good style. More reliable, secure. What do you think?

Otherwise conditionally enabling set -e when FAIL_TRAP_ERR='yes' is possible too but a less clean.

Could you advice please so I prepare a pull request?

adrelanos commented 11 months ago
for i in format_efi_partition prepare_vm mkfs tunefs \
         mount_target mountpoint_to_blockdevice debootstrap_system \
         preparechroot execute_pre_scripts chrootscript execute_post_scripts \
         remove_configs umount_chroot grub_install umount_target fscktool ; do
    if stage "${i}" ; then
      if "$i" ; then
        stage "${i}" 'done' && rm -f "${STAGES}/${i}"
      else
        bailout 2 "$i"
      fi
    fi
done

Pretty sure the if "$i" ; then prevents errors being caught.

adrelanos commented 11 months ago

That helps.

+ mkfs.fat -F32 -n EFI /dev/mapper/loop0p1
/usr/sbin/grml-debootstrap: line 1358: mkfs.fat: command not found
++ error_handler
++ last_exit_code=127
++ last_bash_command='mkfs.fat -F32 -n "EFI" "$EFI_TARGET"'
++ '[' yes = yes ']'
++ echo 'Unexpected non-zero exit code 127 in /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap at line 1358 2181 0 detected!
last bash command: mkfs.fat -F32 -n "EFI" "$EFI_TARGET"'
Unexpected non-zero exit code 127 in /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap /usr/sbin/grml-debootstrap at line 1358 2181 0 detected!
last bash command: mkfs.fat -F32 -n "EFI" "$EFI_TARGET"
++ '[' '!' yes = yes ']'
++ command -v bailout
++ bailout 1
++ cleanup

Error was caught.

(Created https://github.com/grml/grml-debootstrap/issues/225 for the underlying error.)

mika commented 11 months ago

I am pretty sure we need set -e.

I think REPORT_TRAP_ERR='yes' and FAIL_TRAP_ERR='yes' should be enabled by default anyhow. Even without any options needed. That's a good style. More reliable, secure. What do you think?

I'm aware that there are two kind of folks: one that always prefer set -e, and one that hates set -e and avoid it.

Nowadays I personally tend to write scripts with set -e and properly handle failure situations, explicitly.

But adding / enabling set -e by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.

Otherwise conditionally enabling set -e when FAIL_TRAP_ERR='yes' is possible too but a less clean.

Could you advice please so I prepare a pull request?

AFAICS this is what you implemented in https://github.com/grml/grml-debootstrap/pull/226

adrelanos commented 11 months ago

I'm aware that there are two kind of folks: one that always prefer set -e, and one that hates set -e and avoid it.

Interesting.

Nowadays I personally tend to write scripts with set -e and properly handle failure situations, explicitly.

Lucky we're on the same page here.

I also remorse that I haven't written all my scripts with pipefail and nounset from the beginning. nounset could be considered for this repository too at a later time, one step at a time, if progress is made here, which I am considering to contribute.

But adding / enabling set -e by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.

I think strict error handling and failing early is more important and doubt there will be many failures. And the only way to fix these issues that I can see is to let it fail and fix it.

The uses of eend $? style error checking in grml-debootstrap are inconsistent, not everywhere. The inconstant error checking has then lead to follow-up issues which were caught but took me longer to debug.

For the Kicksecure, Whonix build script we've been using grml-debootstrap for raw VM image builds with FAIL_TRAP_ERR='yes' for years. And will continue to do so with the even stricter error handling that is upcoming in https://github.com/grml/grml-debootstrap/pull/226. That's how I conclude that there cannot be that many issues. If nothing went wrong, Linux utilities don't terminate with non-zero exit codes.

However, my blind spot here is that I am not using grml-debootstrap for non-VM builds and not in interactive mode. Related topic:

So what I could suggest is that I attempt to rework the error handling, make sure that VMs build. Would it be feasible for you to test the "other" builds and fix these new issues? Any failing command would probably be easily fixed with appending || true as a quick solution.

mika commented 10 months ago

[...]

Nowadays I personally tend to write scripts with set -e and properly handle failure situations, explicitly.

Lucky we're on the same page here.

I also remorse that I haven't written all my scripts with pipefail and nounset from the beginning. nounset could be considered for this repository too at a later time, one step at a time, if progress is made here, which I am considering to contribute.

ACK, I try to use set -e, set -E, set -u + set -o pipefail in bash scripts.

But adding / enabling set -e by default afterwards tends to be harder, and I'm not sure about its result in current situation of grml-debootstrap, though if you look into this and we get grml-debootstrap working properly with it, I'd also like that.

I think strict error handling and failing early is more important and doubt there will be many failures. And the only way to fix these issues that I can see is to let it fail and fix it.

Fine for me, as long it's not only myself having to do the heavy lifting :)

The uses of eend $? style error checking in grml-debootstrap are inconsistent, not everywhere. The inconstant error checking has then lead to follow-up issues which were caught but took me longer to debug.

ACK

For the Kicksecure, Whonix build script we've been using grml-debootstrap for raw VM image builds with FAIL_TRAP_ERR='yes' for years. And will continue to do so with the even stricter error handling that is upcoming in #226. That's how I conclude that there cannot be that many issues. If nothing went wrong, Linux utilities don't terminate with non-zero exit codes.

Ah, that's very interesting feedback, which certainly removes much of my reluctant feelings towards it and makes me feel positive towards enabling set -e by default! :)

However, my blind spot here is that I am not using grml-debootstrap for non-VM builds and not in interactive mode.

I personally don't use the interactive mode neither that often, though care very much about non-VM builds. A test bed for those would be really useful to have.

Related topic:

* [automated CI testing #228](https://github.com/grml/grml-debootstrap/issues/228)

nod

So what I could suggest is that I attempt to rework the error handling, make sure that VMs build. Would it be feasible for you to test the "other" builds and fix these new issues? Any failing command would probably be easily fixed with appending || true as a quick solution.

Yes, that sounds a like a good plan to me! Count me in :)

adrelanos commented 10 months ago

I am a bit unsure about the cleanup function.

If is in two cases, either:

In case of A), one would probably want "strict error handling". I suppose, if some command unexpectedly exits non-zero, one would want function to abort and the error_handler to be triggered. The error_handler will then call bailout, which then itself calls cleanup.

In case of B), I suppose one probably wants "relaxed error handling". That is, if some command exits non-zero, one would want to see that command failing, however the cleanup function should not be interrupted and continue to completion.

Need some feedback on this one. Happy to do something simpler. It will might become more clear once I push the pull request.


As for execution of the stages... The following would be a bug...

  "$i" || bailout 2 "$i"

For deminstration:

#!/bin/bash

set -x
set -e

trap errorhandler ERR

errorhandler() {
  true ERROR
}

xxx() {
  true hi
  false
  true This message we do not want to see...
}

true START
xxx || exit 1
true STOP

output:

+ set -e
+ trap errorhandler ERR
+ true START
+ xxx
+ true hi
+ false
+ true This message we do not want to see...
+ true STOP

If we want the error handler to be triggered and abort as quickly as possible, we need to avoid function || bailout because that delays the part after the ||.