osism / cloud-in-a-box

Cloud in a box
https://osism.github.io/docs/guides/deploy-guide/examples/cloud-in-a-box
Apache License 2.0
18 stars 4 forks source link

Test Cloud in a Box @ Libvirt/KVM (ISO installation) #263

Open berendt opened 4 months ago

snowmaster007 commented 1 month ago

I get the below output during setup when trying to run an automated installation on a Libvirt/KVM guest:

2024-07-23 06:18:35,002 ERROR root:30 finish: subiquity/Filesystem/apply_autoinstall_config: FAIL: autoinstall config did not create needed bootloader partition 2024-07-23 06:18:35,002 ERROR root:30 finish: subiquity/apply_autoinstall_config: FAIL: autoinstall config did not create needed bootloader partition 2024-07-23 06:18:35,002 ERROR subiquity.server.server:415 top level error Traceback (most recent call last): File "/snap/subiquity/5495/lib/python3.10/site-packages/subiquity/server/server.py", line 696, in start await self.apply_autoinstall_config() File "/snap/subiquity/5495/lib/python3.10/site-packages/subiquitycore/context.py", line 149, in decorated_async return await meth(self, **kw) File "/snap/subiquity/5495/lib/python3.10/site-packages/subiquity/server/server.py", line 466, in apply_autoinstall_config await controller.apply_autoinstall_config() File "/snap/subiquity/5495/lib/python3.10/site-packages/subiquitycore/context.py", line 149, in decorated_async return await meth(self, **kw) File "/snap/subiquity/5495/lib/python3.10/site-packages/subiquity/server/controllers/filesystem.py", line 496, in apply_autoinstall_config raise Exception( Exception: autoinstall config did not create needed bootloader partition

Seems to be there is an issue with creating the bootloader partition. For testing I was using a virtual disk with 130Gbyte (disk bus = SCSI).

When trying to boot the same ISO on a physical machine, it's running fine. Used ISO: https://swift.services.a.regiocloud.tech/swift/v1/AUTH_b182637428444b9aa302bb8d5a5a418c/osism-node-image/ubuntu-autoinstall-cloud-in-a-box-1.iso

berendt commented 1 month ago

This is the source for the ISO file: https://github.com/osism/node-image. Probably we have to add another flavor that works for Libvirt/KVM.

berendt commented 1 month ago

Or maybe we have to work with a pre-installed machine image instead of an ISO image? We do it this way for the CI with https://github.com/osism/ci-image.

snowmaster007 commented 1 month ago

I have to do some further debug why no filesystem(s) will be created on a Libvirt/KVM guest and trying with manual installation as well.

snowmaster007 commented 1 month ago

Based on the guide for manual installation I tried on different VM's, but I always get the below output at the end when running the bootstrap.sh script (step 7) "sudo /opt/cloud-in-a-box/bootstrap.sh sandbox":

bootstrap | PLAY [Make ssh pipelining working] ********************************************* bootstrap | bootstrap | TASK [Do not require tty for all users] **************************************** bootstrap | fatal: [manager.systems.in-a-box.cloud]: UNREACHABLE! => {"changed": false, "msg": "Invalid/incorrect password: Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.\r\nPermission denied, please try again.", "unreachable": true} bootstrap | bootstrap | PLAY RECAP ********************************************************************* bootstrap | manager.systems.in-a-box.cloud : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 bootstrap | bootstrap | Ubuntu 22.04.4 LTS \n \l bootstrap | bootstrap | ESC[5;41;1mERROR: BOOTSTRAP FAILEDESC[0m

snowmaster007 commented 1 month ago

Update:

Based on the manual installation method, bootstrap.sh script was running properly after changing the password for osism user to "password", I will update the appropriate documention. Currently I'm still on progress to try to finish the manual installation as well.

snowmaster007 commented 1 month ago

Update:

Manual installtion was running succefully on Libvirt/KVM.

snowmaster007 commented 4 weeks ago

Update:

The issue with creating the bootloader partition during start of the automated installation doesn't occur when changing the firmware from Bios to UEFI inside the virt-manager VM.

snowmaster007 commented 4 weeks ago

Update:

After some failed attempts, the automated installation finished nearly successfully on my virt-manager VM. First of all I got a timeout regarding phpmyadmin service during the installation process. I assume the download didn't finish within the expected time:

/var/log/install-cloud-in-a-box.log:

...
deploy | TASK [osism.services.phpmyadmin : Manage phpmyadmin service] *******************
deploy | Friday 09 August 2024  13:00:56 +0000 (0:00:01.531)       0:00:05.885 ********* 
deploy | ESC[0;31mfatal: [manager.systems.in-a-box.cloud]: FAILED! => {"msg": "The conditional check 'result.status.ActiveState == \"active\"' failed. The error was: error while evaluating conditional (result.status.ActiveState == \"active\"): 'dict object' has no attribute 'status'"}ESC[0m
...

/var/log/syslog:

...
Aug  9 13:02:27 manager systemd[1]: docker-compose@phpmyadmin.service: start-pre operation timed out. Terminating.
Aug  9 13:02:27 manager docker[355495]: canceled
Aug  9 13:02:27 manager dockerd[755]: time="2024-08-09T13:02:27.973896527Z" level=error msg="Not continuing with pull after error: context canceled"
Aug  9 13:02:27 manager systemd[1]: docker-compose@phpmyadmin.service: Control process exited, code=exited, status=130/n/a
Aug  9 13:02:27 manager systemd[1]: docker-compose@phpmyadmin.service: Failed with result 'timeout'.
Aug  9 13:02:27 manager systemd[1]: Failed to start phpmyadmin service managed by Docker Compose.
...

Therefore I pulled the docker container manually by running "sudo docker compose -f /opt/phpmyadmin/docker-compose.yml pull". After that I continued the automated installation by starting the deploy script manually (sudo /opt/cloud-in-a-box/deploy.sh sandbox).

Later in the process the installation stopped again with the below error:

...
deploy | TASK [k3s_server_post : Install Cilium] ****************************************
deploy | Friday 09 August 2024  14:50:23 +0000 (0:00:00.763)       0:05:21.200 ********* 
deploy | ESC[0;31mfatal: [manager.systems.in-a-box.cloud]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'cluster_cidr' is undefined\n\nThe error appears to be in '/a
nsible/roles/k3s_server_post/tasks/cilium.yml': line 159, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Install Cilium\n
      ^ here\n"}ESC[0m
...

VM specs:

RAM 20GB CPU 8 Cores SCSI Disk 500GB SCSI CDROM mounted with ubuntu-autoinstall-cloud-in-a-box-1.iso

From what I can see, 20GB RAM are not enough and should be increased up to a higher value. I will now continue the automated installation on a more powerfull VM.

snowmaster007 commented 3 weeks ago

Update:

I tried on a new KVM VM with 48GB RAM and 16 vCores, but without success. The automated installation stopped again with the same error as in the test with the smaller sized VM:

...
deploy | TASK [k3s_server_post : Install Cilium] ****************************************
deploy | Monday 12 August 2024  17:24:51 +0000 (0:00:00.517)       0:05:51.271 ********* 
deploy | ESC[0;31mfatal: [manager.systems.in-a-box.cloud]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'cluster_cidr' is undefined\n\nThe error appears to be in '/a
nsible/roles/k3s_server_post/tasks/cilium.yml': line 159, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Install Cilium\n
      ^ here\n"}ESC[0m
...
berendt commented 3 weeks ago

The Cilium issues was fixed with https://github.com/osism/cloud-in-a-box/commit/d8e6b912e434b2e3d11ec1dabb8526a6c993a454.