kubernetes-sigs / image-builder

Tools for building Kubernetes disk images
https://image-builder.sigs.k8s.io/
Apache License 2.0
372 stars 373 forks source link

build-qemu-ubuntu-2204 stuck in "Waiting for SSH to become available..." #1076

Open s-mansouri opened 1 year ago

s-mansouri commented 1 year ago

Hi, I installed the image builder based on this doc. Then to building an image for Openstack I use this doc. But with this command make build-qemu-ubuntu-2204 it stucks in the ssh step. My operating system is ubuntu 22.04 This is the log of the command:

hack/ensure-ansible.sh
fatal: not a git repository (or any of the parent directories): .git
Starting galaxy collection install process
Nothing to do. All requested collections are already installed. If you want to reinstall them, consider using `--force`.
hack/ensure-packer.sh
hack/ensure-goss.sh
Right version of binary present
packer build -var-file="/root/image-builder/images/capi/packer/config/kubernetes.json"  -var-file="/root/image-builder/images/capi/packer/config/cni.json"  -var-file="/root/image-builder/images/capi/packer/config/containerd.json"  -var-file="/root/image-builder/images/capi/packer/config/wasm-shims.json"  -var-file="/root/image-builder/images/capi/packer/config/ansible-args.json"  -var-file="/root/image-builder/images/capi/packer/config/goss-args.json"  -var-file="/root/image-builder/images/capi/packer/config/common.json"  -var-file="/root/image-builder/images/capi/packer/config/additional_components.json"  -color=true -var-file="/root/image-builder/images/capi/packer/qemu/qemu-ubuntu-2204.json"  packer/qemu/packer.json
fatal: not a git repository (or any of the parent directories): .git
qemu: output will be in this color.

==> qemu: Retrieving ISO
==> qemu: Trying https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso
==> qemu: Trying https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso?checksum=sha256%3A10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb
    qemu: ubuntu-22.04.1-live-server-amd64.iso 1.37 GiB / 1.37 GiB [==================================================================================================================] 100.00% 1m15s
==> qemu: https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso?checksum=sha256%3A10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb => /root/.cache/packer/281aa9855752339063385b35198e73db74cd61ba.iso
==> qemu: Starting HTTP server on port 8247
==> qemu: Found port for communicator (SSH, WinRM, etc): 2769.
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
    qemu: The VM will be run headless, without a GUI. If you want to
    qemu: view the screen of the VM, connect via VNC without a password to
    qemu: vnc://127.0.0.1:5952
==> qemu: Waiting 10s for boot...
==> qemu: Connecting to VM via VNC (127.0.0.1:5952)
==> qemu: Typing the boot commands over VNC...
    qemu: Not using a NetBridge -- skipping StepWaitGuestAddress
==> qemu: Using SSH communicator to connect: 127.0.0.1
==> qemu: Waiting for SSH to become available...

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

wwentland commented 1 year ago

Could you try running the build with FOREGROUND=1 to observe where the process gets stuck?

s-mansouri commented 1 year ago

@wwentland Thanks for your response. with this command: make build-qemu-ubuntu-2204 FOREGROUND=1 PACKER_LOG=1 I got this error:

==> qemu: Starting VM, booting from CD-ROM
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu Builder has no floppy files, not attaching a floppy.
2023/02/20 08:20:59 packer-builder-qemu plugin: Executing /usr/bin/qemu-system-x86_64: []string{"-device", "virtio-scsi-pci,id=scsi0", "-device", "scsi-hd,bus=scsi0.0,drive=drive0", "-device", "virtio-net,netdev=user.0", "-name", "ubuntu-2204-kube-v1.23.15", "-drive", "if=none,file=output/ubuntu-2204-kube-v1.23.15/ubuntu-2204-kube-v1.23.15,id=drive0,cache=writeback,discard=unmap,format=qcow2", "-drive", "file=/root/.cache/packer/281aa9855752339063385b35198e73db74cd61ba.iso,media=cdrom", "-netdev", "user,id=user.0,hostfwd=tcp::3163-:22", "-m", "2048M", "-smp", "1", "-boot", "once=d", "-machine", "type=pc,accel=kvm", "-display", "gtk", "-vnc", "127.0.0.1:91"}
2023/02/20 08:20:59 packer-builder-qemu plugin: Started Qemu. Pid: 16836
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu stderr: qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
2023/02/20 08:20:59 packer-builder-qemu plugin: Qemu stderr: gtk initialization failed
2023/02/20 08:20:59 packer-builder-qemu plugin: failed to unlock port lockfile: close tcp 127.0.0.1:5991: use of closed network connection
2023/02/20 08:20:59 packer-builder-qemu plugin: failed to unlock port lockfile: close tcp 127.0.0.1:3163: use of closed network connection
==> qemu: Error launching VM: Qemu failed to start. Please run with PACKER_LOG=1 to get more info.
==> qemu: Deleting output directory...
2023/02/20 08:20:59 [INFO] (telemetry) ending qemu
==> Wait completed after 11 seconds 963 milliseconds
2023/02/20 08:20:59 machine readable: error-count []string{"1"}
==> Some builds didn't complete successfully and had errors:
2023/02/20 08:20:59 machine readable: qemu,error []string{"Build was halted."}
==> Builds finished but no artifacts were created.
Build 'qemu' errored after 11 seconds 963 milliseconds: Build was halted.

==> Wait completed after 11 seconds 963 milliseconds

==> Some builds didn't complete successfully and had errors:
--> qemu: Build was halted.
wwentland commented 1 year ago

Thank you! Right, this is to be expected if it is a headless box. You could try connecting via VNC or reproduce the issue locally. You would have to adjust the address to which vnc binds to 0.0.0.0 or another appropriate IP (cf. https://developer.hashicorp.com/packer/plugins/builders/qemu#vnc_bind_address) for VNC to work, I think.

Does it hang every time, or are there some builds that work and others that hang? How often have you tried?

tibeer commented 1 year ago

I can also confirm this. We are building our images in on a headless machine within a CI-CD pipeline. It happens every time for us. Will try to investigate according to your suggestions.

tibeer commented 1 year ago

Seems that it get's stuck on this step screenshot

tibeer commented 1 year ago

The problem is resolved for us at least. CI-CD now works again. Regarding the reason: I honestly cannot tell you. Seems that it was just a hick-up.

wwentland commented 1 year ago

That's great to hear @tibeer. I only ever ran into a stuck build once, but it failed much earlier in the process (error while entering the boot command).

I'm not seeing anything obvious in the output you pasted and it might have just been taking a long time installing the base system. This could very well be due to problems in the build environment (e.g. network issues) that only present themselves intermittedly, but aren't directly caused by a misconfiguration of the build process.

BarthV commented 1 year ago

same issue here :( I'm running a simple make build-qemu-ubuntu-2204 and it's getting struck waiting for SSH link.

image

@tibeer If you can try to remember what solved your problem it would be <3

BarthV commented 1 year ago

ok .. So I just read the documentation ;-S

https://developer.hashicorp.com/packer/plugins/builders/qemu

This is an example only, and will time out waiting for SSH because we have not provided a kickstart file. You must add a valid kickstart file to the "http_directory" and then provide the file in the "boot_command" in order for this build to run. We recommend you check out the Community Templates for a practical usage example.

It seems that some kind of template is missing , And it must be a common mistake.

xinity commented 1 year ago

having the same issue :( falling back my demo to 2004 sadly or now.

hoping someone would found what's missing :(

mikejoh commented 1 year ago

@xinity I'm also hitting this atm, locally on my computer, the build was eventually completed:

Build 'qemu' finished after 25 minutes 31 seconds.

==> Wait completed after 25 minutes 31 seconds

==> Builds finished. The artifacts of successful builds are:
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11
--> qemu: VM files in directory: ./output/ubuntu-2204-kube-v1.24.11

yikes! But I don't know what build times to expect either, did you ever wait to see if it was completed?

If made the following changes to cut off ~10min:

diff --git a/images/capi/packer/qemu/qemu-ubuntu-2204.json b/images/capi/packer/qemu/qemu-ubuntu-2204.json
index 65efe6be0..1f620abc6 100644
--- a/images/capi/packer/qemu/qemu-ubuntu-2204.json
+++ b/images/capi/packer/qemu/qemu-ubuntu-2204.json
@@ -3,9 +3,9 @@
   "build_name": "ubuntu-2204",
   "distro_name": "ubuntu",
   "guest_os_type": "ubuntu-64",
-  "iso_checksum": "10f19c5b2b8d6db711582e0e27f5116296c34fe4b313ba45f9b201a5007056cb",
+  "iso_checksum": "5e38b55d57d94ff029719342357325ed3bda38fa80054f9330dc789cd2d43931",
   "iso_checksum_type": "sha256",
-  "iso_url": "https://old-releases.ubuntu.com/releases/jammy/ubuntu-22.04.1-live-server-amd64.iso",
+  "iso_url": "https://releases.ubuntu.com/jammy/ubuntu-22.04.2-live-server-amd64.iso",
   "os_display_name": "Ubuntu 22.04",
   "shutdown_command": "shutdown -P now",
   "unmount_iso": "true"

Not directly solving any issues here but using a newer Ubuntu 22.04 base iso so that the package upgrade steps take less time. The original Ubuntu iso used is around a year old now, with quite a bit of delta in terms of missing package upgrades.

xinity commented 1 year ago

will try that out today and let you know 🤞

nikParasyr commented 1 year ago

Came across this issue as well. After 22min i cancelled the first run as it seemed to have stucked on ==> qemu: Waiting for SSH to become available.... That was an assumption at that point.

After that I made the changes recommended by @mikejoh which seems to have "solved it".

I speculate that because the default image is rather old, the package upgrade step takes too long, and depending on the environment might even pass the ssh timeout set by packer, or the patience of the user (like me who killed the first run after 22min assuming it was stuck). So using the newer image made the package upgrade faster and after ~10min i get into the config phase.

Not sure what an appropriate fix would be for this. Bump the packer ssh timeout, document it and "periodically" update the base images to newer?

mikejoh commented 1 year ago

@nikParasyr 👍🏻 As a side note to this, I'm not sure if the Ubuntu 22.04 image actually works and boots correctly. I'm evaluating CAPI + CAPO at the moment, I've only managed to build the image but not tested it!

nikParasyr commented 1 year ago

@mikejoh I ran into some issues well that i couldnt troubleshoot. People in the capo slack channel pointed out to me that the are running ubuntu 22.04 images but they are built with https://image-builder.sigs.k8s.io/capi/providers/openstack-remote.html and not the qemu-builder. the openstack-remote provider worked for me as well. I've opened a ticket (#1137) with my findings for the qemu built. I hope this helps

BarthV commented 1 year ago

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems. Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

mnaser commented 1 year ago

I think this indeed is an issue stemming from the fact that we have a very big apt upgrade that happens.

fad3t commented 1 year ago

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems. Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

Hi @BarthV, any chance you can share the config you're using to build from the cloudimg? Thx!

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

mboersma commented 5 months ago

/remove-lifecycle rotten

ygao-armada commented 4 months ago

For 22.04, I had similar issue with provider "vsphere" with command: image-builder build --os ubuntu --os-version 22.04 --hypervisor vsphere --release-channel 1-28 --vsphere-config vsphere.json --firmware efi

It turns out, the VM IP changes due to reboot (I workaround it by forcing VM IP back with netplan apply). Not sure if the root case of this ticket is related to that of vsphere.

mnaser commented 2 months ago

Just a warning, for those who are using the latest image, there has been some changes that break things.

https://github.com/vexxhost/magnum-cluster-api/issues/378

So you would end up with non functional images.

abrahamhwj commented 2 months ago

I had similar issue with “make build-proxmox-ubuntu-2204”

image

just hang there until dead...

justinas-b commented 1 month ago

I had similar issue with “make build-proxmox-ubuntu-2204” image just hang there until dead...

Hey @abrahamhwj , have you found any workaround for this? For me proxmox build is stuck in same place. If i check the terminal, i see that new VM is stuck on language selection screen.

justinas-b commented 1 month ago

Maybe it's time to stop using ubuntu legacy live iso image for newest releases ? I observed that it's seems to be the main cause of all these problems. Legacy image is deprecated and tends to be replaced by ubuntu cloudimg : https://cloud-images.ubuntu.com/

So (on my side) I'm currently replacing ubuntu image & script used by image-builder, using this server cloudimg and everything works like a charm.

@BarthV could you share more details please what exactly needs to be updated so that cloudimg would work?

justinas-b commented 1 month ago

OK, it took me a while, but it seems i have figured it out. So in my case i was building proxmox template, and got Waiting for SSH to become available... due to multiple factors: