`qemu: uncaught target signal 11 (Segmentation fault) - core dumped`

hrueger commented 1 year ago

Hi @guysoft, First of all, thanks for this project. It has been working great so far, however, I'm now stuck with the following error: One of my modules is an update module. It runs sudo apt-get update followed by sudo apt-get upgrade -y.

On my local machine, this works just fine. I'm running Windows 10 with Docker using WSL2.

On GitHub Actions, though, that fails with the following error:

Preparing to unpack .../libc-bin_2.31-13+rpt2+rpi1+deb11u5_arm64.deb ...
Unpacking libc-bin (2.31-13+rpt2+rpi1+deb11u5) over (2.31-13+rpt2+rpi1+deb11u4) ...
Setting up libc-bin (2.31-13+rpt2+rpi1+deb11u5) ...
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)
dpkg: error processing package libc-bin (--configure):
 installed libc-bin package post-installation script subprocess returned error exit status 139
Errors were encountered while processing:
 libc-bin
E: Sub-process /usr/bin/dpkg returned an error code (1)
++++ echo_red 'build failed, unmounting image...'
++++ echo -e -n '\e[91m'
++++ echo build failed, unmounting image...

This is reproducible and always fails at the exact same package.

Other info that might be important:

GH Actions runner uses ubuntu-latest
My base Image is the latest Pi OS Lite 64bit

Do you have any idea what I'm doing wrong?

guysoft commented 1 year ago

Looks like its related to: https://github.com/moby/qemu/issues/19

Long talk about it here: https://github.com/docker/buildx/issues/314

The first issue suggest updating the version of qemu, it looks like a qemu bug

hrueger commented 1 year ago

Thanks for the hint, somehow I didn't find this when googling yesterday. I'll try updating qemu and close this issue if it works 👍

hrueger commented 1 year ago

Uh - that might be a stuipd question, but how do I try that with a newer version of qemu? I haven't though about that when I wrote the other comment earlier, but if I understand correctly, the qemu running inside your guysoft/custompios:devel docker image would have to be updated?

guysoft commented 1 year ago

Ok looks like we were using old buster, and the right qemu was only available on Debian bullseye backports, I pushed that change to devel, the new custompios container should build with qemu 7.1+. Would be great if you could pull that and validate it fixes this issue :)

hrueger commented 1 year ago

Thanks for updating. However, now I get a different error:

+++ echo 'Adding 1000 MB to partition 2 of 2022-09-22-raspios-bullseye-arm64-lite.img' 

Adding 1000 MB to partition 2 of 2022-09-22-raspios-bullseye-arm64-lite.img 

++++ awk '{print $4-0}' 

++++ grep 2022-09-22-raspios-bullseye-arm64-lite.img2 

++++ sfdisk -d 2022-09-22-raspios-bullseye-arm64-lite.img 

/CustomPiOS/common.sh: line 282: sfdisk: command not found 

+++ start= 

/CustomPiOS/common.sh: line 283: *512: syntax error: operand expected (error token is "*512") 

+ exit 1 

Error: Process completed with exit code 1.

I don't have more info available right now as I'm just on my phone. I can add additional details later today.

tampe125 commented 1 year ago

I'm sorry to jump on this issue, but I'm having the same error:

label-id: 0x63ee4f38
device: 2022-09-22-raspios-bullseye-arm64.img
unit: sectors
sector-size: 512

2022-09-22-raspios-bullseye-arm64.img1 : start=        8192, size=      524288, type=c
2022-09-22-raspios-bullseye-arm64.img2 : start=      532480, size=     8208384, type=83'
CustomPiOS/src/common.sh: line 191: 0
8192
532480 * 512: syntax error in expression (error token is "8192
532480 * 512")
+ exit 1

maybe there's a trim missing somewhere? Should I open a new issue for this?

hrueger commented 1 year ago

@guysoft with your latest commit, the sfdisk not found error is now gone. However, the second error (which @tampe125 is also experiencing) still exists.

guysoft commented 1 year ago

Hey, fixed that too, and tested, it should work now

guysoft commented 1 year ago

@tampe125 Yes, please open a second issue and provide a full log

guysoft commented 1 year ago

Hang on @tampe125 something indeed broke, no need to open an issue. Working on it

guysoft commented 1 year ago

Ok, so the issue is that

qemu was too old so the docker image had to be updated to bullseye with backports for qemu 7.1+
bullseye packaged sfdisk default output changed making the awk fail, so what I did is switch to the new --json sfdisk format option that should remain standardized
also added jq as a requirement to parse --json format

Its still building here but its looking good.

The reason my local tests failed to spot this before was that they didn't pull my own git to the build server correctly, fixed that too.

ada-phillips commented 1 year ago

@guysoft Not to pile on here, but I had the second error above as well, and now that's fixed but I'm getting an exciting third error. I assume it's related, given the timing and all, but should I make a new issue?

+++ e2fsck -fy /dev/loop3
e2fsck 1.46.5 (30-Dec-2021)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/loop3

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

hrueger commented 1 year ago

I now get the same error @ada-phillips gets.

++++ jq '.partitiontable.partitions[] | select(.node == "2022-09-22-raspios-bullseye-arm64-lite.img1").start'
+++ start=8192
++++ jq '.partitiontable.partitions[] | select(.node == "2022-09-22-raspios-bullseye-arm64-lite.img1").size'
+++ e2fsize_blocks=524288
+++ offset=4194304
+++ detach_all_loopback 2022-09-22-raspios-bullseye-arm64-lite.img
++++ grep 2022-09-22-raspios-bullseye-arm64-lite.img
++++ losetup
++++ awk '{ print $1 }'
+++ test_for_image 2022-09-22-raspios-bullseye-arm64-lite.img
+++ '[' '!' -f 2022-09-22-raspios-bullseye-arm64-lite.img ']'
++++ losetup -f --show -o 4194304 2022-09-22-raspios-bullseye-arm64-lite.img
+++ LODEV=/dev/loop3
+++ trap 'losetup -d $LODEV' EXIT
+++ e2fsck -fy /dev/loop3
e2fsck 1.46.2 (28-Feb-2021)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/loop3

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 
 or
    e2fsck -b 32768 

/dev/loop3 contains a vfat file system labelled 'boot'
++ losetup -d /dev/loop3
+ exit 8
Error: Process completed with exit code 8.

guysoft commented 1 year ago

Ok, last fix of a typo works on my end. It builds and there is an image and all

hrueger commented 1 year ago

I can confirm that everything is fixed now, thanks a lot for your work 👍

guysoft commented 1 year ago

Great. If you are making a distribution feel free to share it in the list in the readme file

guysoft / CustomPiOS

`qemu: uncaught target signal 11 (Segmentation fault) - core dumped` #188