openzfs / zfs-buildbot

The OpenZFS Buildbot Configuration
http://build.zfsonlinux.org
BSD 2-Clause "Simplified" License
25 stars 29 forks source link

Try to work around Debian kernel weirdness #238

Closed rincebrain closed 2 years ago

rincebrain commented 2 years ago

Looking at this, it fails to build later because zlib1g-dev didn't get installed when the huge apt install failed because of this:

+ sudo -E apt-get --yes install linux-headers-5.10.0-0.bpo.8-cloud-arm64 zlib1g-dev uuid-dev libblkid-dev libselinux-dev xfslibs-dev libattr1-dev libacl1-dev libudev-dev libdevmapper-dev libssl-dev libaio-dev libffi-dev libelf-dev libmount-dev libpam0g-dev pamtester python-dev python-setuptools python-cffi python-packaging python3 python3-dev python3-setuptools python3-cffi libcurl4-openssl-dev python3-packaging python-distlib python3-distlib
Reading package lists...
Building dependency tree...
Reading state information...
Package linux-headers-5.10.0-0.bpo.8-cloud-arm64 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'linux-headers-5.10.0-0.bpo.8-cloud-arm64' has no installation candidate

Which is curious, because the buildslave AMI itself is running 4.19.0-11-arm64, and the x86_64 testslave is getting a non-bpo kernel...

Unfortunately, I can't really see into the AMI instance that's spawned, so I can't tell what AMI it was running or why that has a BPO kernel and the x86_64 does not, just that in the ones that succeeded before it really is using those headers and not, say, the kernel version the buildbot has. Since the buildslave's AMIs are the only ones I see mentioned in the repo, I can't see where it would be getting one that is flawed like that...

So here's a workaround. Note that this should not be merged as-is - if this is going to happen, the relevant package files should be rehosted somewhere and that location pointed to instead, because otherwise the snapshot "mirror" is probably going to start banning hosts for hitting it too much.

behlendorf commented 2 years ago

When the buildbot spins up an AMI it calls the bb-bootstrap.sh script which installs and configures buildbot, does some minimal additional configuration, and then installs the latest kernel. For the Debian aarch64 AMIs it also switches to using the linux-image-cloud-arm64 repository for the newer kernel then reboots on to it.

This sure looks like a problem with the linux-image-cloud-arm64 repository on Buster not providing a matching linux-image and linux-headers package. It's quite easy to reproduce using the latest AMI for Buster.

One option which would probably work is to move to Debian Bullseye now that's it's been released. There doesn't appear to be any issue with the repositories there. However, we'll may run in to other minor issues. For example, it does look like some of the names of packages we need to install have changed.

Presumably the Debian linux-image-cloud-arm64 repository on Buster will get sorted out at some point to which would sort things out.

rincebrain commented 2 years ago

That's curious. (Why does it need the newer kernel?)

Sigh. I've filed a bug against Debian.

In the interim, we could just fetch the old package, as I suggested, or hopping to Debian 11 would also work.

behlendorf commented 2 years ago

My recollection is the newer kernel was primarily to get better performance in ec2. We could probably do without.

Let me try and roll things forward to Debian 11. A test build went fine, the CI is idle, and we should move forward anyway.

behlendorf commented 2 years ago

Moving forward didn't go as smoothly as I'd hoped. In the end I opted to make two CI changes to resolve these failures. I switched up back to the default kernel for aarch64, and I updated the bootstrap script so we always reboot on to the newest kernel. In practice we were already doing this for almost all of the builders anyway.

5b320ac Always reboot Linux builders on to the latest kernel 2f2927f Revert "bb-bootstrap, install a newer kernel on buster arm64"

behlendorf commented 2 years ago

Closing. Switching to the default kernel worked as expected and resolved this issue.