kubevirt / kubevirtci

Contains cluster definitions and client tools to quickly spin up and destroy ephemeral and scalable k8s and ocp clusters for testing
Apache License 2.0
81 stars 119 forks source link

Build multi-arch fedora-realtime and fedora-with-test-tooling image #1004

Closed zhlhahaha closed 11 months ago

zhlhahaha commented 1 year ago

In order to have a well rounded e2e tests on Arm64 platform, we need to build multi-arch fedora-realtime and fedora-with-test-tooling image which are used in many e2e tests.

I have submit a patch serial to make this works. There are some discuss on this in https://github.com/kubevirt/project-infra/pull/2630. To make the build process work, here are steps and corresponding PR links:

  1. A multi-arch vm-image-builder image which allow use to do cross build and native build for VM images. https://github.com/kubevirt/project-infra/pull/2630 and https://github.com/kubevirt/project-infra/pull/2631
  2. Enable build different CPU arch fedora-realtime VM image and fedora-with-test-tooling VM image. https://github.com/kubevirt/kubevirtci/pull/732
  3. Add prow jobs to build muti-arch VM images in vm-image-builder https://github.com/kubevirt/project-infra/pull/2760
zhlhahaha commented 1 year ago

cc: @andreabolognani @rmohr @dhiller @xpivarc @brianmcarey

zhlhahaha commented 1 year ago

Problems on build fedora-with-test-tooling image, now the image failed to build with following error message:

virt-sysprep: error: libguestfs error: inspect_os: mount exited with status 32: mount: /tmp/btrfsJWefGz: unknown filesystem type 'btrfs'.

The root cause is that we might lack support for btrfs in the host kernel, while virt-sysprep requires the VM to be mounted as btrfs.

During the preparation of the VM image using virt-sysprep, the tool examines the guest and utilizes libguestfs to mount all volumes of the guest VM. Here is the filesystem within the guest VM:

[root@localhost fedora]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1  366K  0 rom  
zram0  251:0    0  1.9G  0 disk [SWAP]
vda    252:0    0    5G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0 1000M  0 part /boot
├─vda3 252:3    0  100M  0 part /boot/efi
├─vda4 252:4    0    4M  0 part 
└─vda5 252:5    0  3.9G  0 part /home

[root@localhost fedora]# cat /etc/fstab
UUID=a280b604-6023-4ba5-bb9e-80d612f84b0d /                       btrfs   subvol=root,compress=zstd:1 0 0
UUID=c2457f56-74ee-4fb3-9748-b79bb5f6c1bc /boot                   ext4    defaults        1 2
UUID=6C81-19BE          /boot/efi               vfat    defaults,uid=0,gid=0,umask=077,shortname=winnt 0 2
UUID=a280b604-6023-4ba5-bb9e-80d612f84b0d /home                   btrfs   subvol=home,compress=zstd:1 0 0

As you can observe, the filesystem of some volumes is btrfs. However, it appears that btrfs is not included in the list of /proc/filesystems within the builder container, which is supposed to be the same as the host's list. For more information, please refer to the build log https://prow.ci.kubevirt.io/view/gs/kubevirt-prow/pr-logs/pull/kubevirt_kubevirtci/1060/check-provision-fedora-with-test-tooling/1683766811591446528

We may need to update or add btrfs related mod on the hosts to make the script works. Do you have any suggestion @andreabolognani ?

andreabolognani commented 1 year ago

@zhlhahaha I'm not familiar with the actual hardware behind prow, but it sure looks like the job might be running on a host that doesn't have btrfs support.

If that's the case, I don't think there's much we can do except trying to get it reprovisioned with some OS that includes btrfs support.

I think we're only running into this now because the cloud images that we've been using so far are for Fedora 32, when btrfs was not yet the default.

zhlhahaha commented 1 year ago

I think we're only running into this now because the cloud images that we've been using so far are for Fedora 32, when btrfs was not yet the default.

Yes, you are right, I do not get this issue when build fedora-realtime which is based on Fedora 32. And the issue happens on both Fedora 35 and Fedora 38.

Hi, @brianmcarey, is it possible to add btrfs mod into host OS behind prow? Or do you have any suggestion?

zhlhahaha commented 1 year ago

Hi, @brianmcarey , here is the script I run in my local system to build and publish image. Hopefully, this will help.

$ docker run --privileged --rm -d --name builder -e CONSOLE=true -e DEBUG=false quay.io/kubevirtci/vm-image-builder:v20230607-9021afd sleep infinity
$ docker exec -it builder /bin/bash
$ git clone https://github.com/kubevirt/kubevirtci.git
$ cd kubevirtci/cluster-provision/images/vm-image-builder/

# the current image-url for x86_64 is not available, it need to update the url, 
# here I use the patch in https://github.com/kubevirt/kubevirtci/pull/1060 
# or you can update the base image to fedora 38.
$ wget https://github.com/kubevirt/kubevirtci/pull/1060/commits/f1477b5cf00fca6139007d6f84c8a671ac727ee7.patch

# config the git user.name and user.email
$ git am f1477b5cf00fca6139007d6f84c8a671ac727ee7.patch

# login to the container image repository
$ podman login docker.io/zhlhahaha

# build and push the multi-arch image
$ start_libvirtd.sh
$ ./publish-multiarch-containerdisk.sh example zhlhahaha docker.io
brianmcarey commented 1 year ago

I think we're only running into this now because the cloud images that we've been using so far are for Fedora 32, when btrfs was not yet the default.

Yes, you are right, I do not get this issue when build fedora-realtime which is based on Fedora 32. And the issue happens on both Fedora 35 and Fedora 38.

Hi, @brianmcarey, is it possible to add btrfs mod into host OS behind prow? Or do you have any suggestion?

@zhlhahaha sorry for only getting back to you now but I was looking into this during the day. The workloads cluster is openshift based which doesn't have btrfs support.

For getting these published in the immediate term - I can publish them from here locally.

We may have to look at moving these images to CentOS stream going forward.

brianmcarey commented 1 year ago

@zhlhahaha The following images have been built and published:

quay.io/kubevirtci/fedora-with-test-tooling:v20230726-3a66690 quay.io/kubevirtci/fedora-realtime:v20230726-3a66690

zhlhahaha commented 1 year ago

@zhlhahaha The following images have been built and published:

quay.io/kubevirtci/fedora-with-test-tooling:v20230726-3a66690 quay.io/kubevirtci/fedora-realtime:v20230726-3a66690

Thanks Brian!

andreabolognani commented 1 year ago

@brianmcarey does OpenShift not support btrfs at all, or is that a limitation that could be addressed by e.g. upgrading to a newer release?

Either way, let's make sure we don't lose track of this. Uploading locally-built images obviously works fine in a pinch, but I think we want to move away from that as much as possible and build everything in a controlled environment as part of a formally defined CI job.

kubevirt-bot commented 11 months ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

andreabolognani commented 11 months ago

@zhlhahaha does anything still need to happen here, or can we close the issue?

zhlhahaha commented 11 months ago

I think we can close this issue, thank @andreabolognani /close

kubevirt-bot commented 11 months ago

@zhlhahaha: Closing this issue.

In response to [this](https://github.com/kubevirt/kubevirtci/issues/1004#issuecomment-1786840727): >I think we can close this issue, thank @andreabolognani >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
brianmcarey commented 11 months ago

@brianmcarey does OpenShift not support btrfs at all, or is that a limitation that could be addressed by e.g. upgrading to a newer release?

@andreabolognani Sorry I meant to get back to you on this but let it slip - I looked into it at the time and I don't believe there is support for btrfs in RHCOS so I think that is where the limitation is.

Either way, let's make sure we don't lose track of this. Uploading locally-built images obviously works fine in a pinch, but I think we want to move away from that as much as possible and build everything in a controlled environment as part of a formally defined CI job.

I will add a task to our backlog to look at where we could run these builds or if there is some way of working around this issue.

andreabolognani commented 11 months ago

@brianmcarey don't worry about it :)

The CentOS Stream 9 cloud images are using xfs instead of btrfs, so maybe switching over could be an alternative way of handling things? I'm not sure whether all software that we want to be in the test images is available in CentOS Stream / EPEL though.

Another approach could be to ditch the cloud images and create our own from scratch using virt-install and a kickstart file. That way we'd have full control over the contents, including the filesystem used. That'd require a non-trivial amount of work though.

If we could make the problem go away by just changing the host OS, that would of course be a lot more convenient ;)