grml / grml-debootstrap

wrapper around debootstrap
59 stars 27 forks source link

VM loop device not cleaned up in CI #252

Open zeha opened 9 months ago

zeha commented 9 months ago
 * Removing loopback mount of file /code/qemu-1.img.
previous state:
loop3p1 (253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
after kpartx-d
loop3p1 (253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
loop_part is: loop3p1
loop3p1 (253:0)
/dev/loop1: [2065]:72205 (/var/lib/snapd/snaps/core20_2015.snap)
/dev/loop2: [2065]:72206 (/var/lib/snapd/snaps/snapd_20290.snap)
/dev/loop0: [2065]:72204 (/var/lib/snapd/snaps/lxd_24322.snap)
/dev/loop3: [2065]:282883 (/code/qemu-1.img)
 * Finished execution of grml-debootstrap. Enjoy your Debian system.

At least in GitHub Actions the cleanup of the loop device doesn't seem to work properly.

adrelanos commented 9 months ago

Also modprobe loop is failing as I mentioned in https://github.com/grml/grml-debootstrap/pull/248#issuecomment-1817382866 - same issue or separate issue?

zeha commented 9 months ago

Separate issue, I'd think. The loop device generally works there.

adrelanos commented 9 months ago

Got any (CI) log where this can be seen?

Maybe a github actions upstream bug?

Do you think you could come up with minimal code for reproduction? Then this could be reported to github actions.

zeha commented 9 months ago

Here:

https://github.com/grml/grml-debootstrap/actions/runs/6922515270/job/18829335284?pr=250#step:5:35

adrelanos commented 9 months ago

I don't fully understand that code. However, to report this bug to github actions we'd need a tiny script as minimal and simple as possible. Surely not using docker if avoidable and certainly not mentioning grml-debootstrap.

qemu-img, parted, kpartx, losetup, mount... Which are the minimal steps required to reproduce this on github CI?

Maybe there's already an open bug report: https://github.com/actions/runner/issues

adrelanos commented 9 months ago

Maybe not a github actions bug.

Here people had a similar issues:

Someone indicated using losetup with -P --partscan might help.

-P, --partscan

Force the kernel to scan the partition table on a newly created loop device. Note that the partition table parsing depends on sector sizes. The default is sector size is 512 bytes, otherwise you need to use the option --sector-size together with --partscan.

Are more important takeaway might be that one cannot (easily) mount the "same" image twice. Does your code attempt to mount both images at the same time?

It's not the same file but the images created by your scripts might look confusingly similar to the Linux coreutils.

Here is how others fixed a similar issue by using mount with sizelimit but I think this might not be applicable here. https://github.com/ryankurte/docker-rpi-emu/commit/a66a9667bdf0745379e2fbe221ecbed309669441

Would it be an option for you to modify your PR to mount only 1 image at a time to work around this bug?

From above forum topic a user suggested:

You don't need to create a loop device, using the "loop" parameter in the mount command suffice. mount -o loop,offset=$((98304*512)),sizelimit=1753219072 /srv/raspi/current/2019-04-08-raspbian-stretch-lite.img /mnt

Not sure to grml-debootstrap could do something similar, i.e. avoid kpartx / losetup. Using offset might be more complicated and error prone.

zeha commented 9 months ago

No, the problem here is like this:

  1. grml-debootstrap puts the img file onto a loop device, so it can modify the partitions in the image. And it really wants the loop device with partitions, so it can modify the EFI partition and the root filesystem, and delegate placement of everything to fdisk etc.
  2. When grml-debootstrap is done, the image should not be attached to a loop device. This fails for unknown reasons.
  3. Later the CI scripts try to mount the image again, and this "obviously" fails because step 2 failed.

If grml-debootstrap weren't a shell script I'd try replacing losetup/(k)partx/... with syscalls, but alas...

adrelanos commented 9 months ago

syscalls might help with debugging and finding out what the issue is but generally I think it's better to stick with the Linux coreutils.

There was a mysterious kpartx in the past that might still not be fully / cleanly fixed. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734794

If there's anything similar would be good to get that reported upstream.

Are you sure about the offset? I don't know where the number 4194304 is coming from.

Maybe replace the mount using offset with the usual way of doing this?

Could you add additional debug output please?

zeha commented 9 months ago

There was a mysterious kpartx in the past that might still not be fully / cleanly fixed.

Yeah, I was generally thinking we could switch from kpartx to partx, as thats in util-linux. But I haven't investigated this option.

Are you sure about the offset? I don't know where the number 4194304 is coming from.

The offset is correct for the specific configuration tested; but this is exactly why I don't want to deal with offsets. (k)partx does this calculation, and I don't want to write code for parsing partition tables... (Comment above the number explains where it comes from.)

zeha commented 9 months ago

https://github.com/grml/grml-debootstrap/actions/runs/7172550946/job/19529980137?pr=250#step:4:3166

This is from a run with more -v. You can see how kpartx -d apparently did nothing.

adrelanos commented 9 months ago

./tests/docker-test-b2b.sh: line 19: dmsetup: command not found

zeha commented 9 months ago

./tests/docker-test-b2b.sh: line 19: dmsetup: command not found

sure, but this is a long time after the problem occurred.