Open amizeranschi opened 1 year ago
@sj213 can you have a look at this?
@amizeranschi which version of Rocky do you need. The latest versions of VGCN here is built with Rockey 8.6 afaik.
@bgruening thanks for the reply. The reason I tried to build my own image is that I intend to customize it for my needs. More precisely, I want to add SLURM to it, either alongside HTCondor (preferably), or replacing it, if the two won't be able to coexist.
I found this ansible role for Slurm mentioned in the Galaxy tutorials and I want to try to include it into VGCN. Any advice on whether this is a good idea and how I could go about it would be much appreciated.
That is a very nice idea. They can coexist and I think we should get this into our images as well. Thanks for working on this. I hope @sj213 can help us here.
It's been years I last dealt with packer and even back then I never rly digged deep into it, so I'm afraid my guesses are not as informed as you would like. Basically I have no idea what could be going wrong. Error Nr. 1 is EPERM ("Operation not permitted") but I wouldn't be particularly concerned about that one as the Makefile deliberately ignores it, so it is probably harmless and occurs always. Apart from this error, the logs indicate that everything is going according to plan. I have no idea why the generated image won't boot on qemu. Actually I'm not even sure if the generated image is even supposed to be bootable outside of the Openstack cloud. Maybe Openstack extracts the kernel and initrd files from the image, bypassing GRUB, in order to be able to pass command line options to the kernel and in this case the generated image would likely not have a valid boot sector in the first place - but that's just a wild guess, I'm not actually familiar enough with Openstack's inner workings. Sry for the less-than-helpful response...
Hi @sj213, thanks for your reply. I imported the resulting image in OpenStack and created a volume and then an instance from it. The resulting instance appears to have an Active
status in OpenStack, but I'm unable to SSH into it and it also doesn't respond to ping, neither on its local IP, nor on a public one (floating IP) that I've assigned to the instance.
For comparison, using the public VGCN images posted here always produced usable machines that I could SSH into, in our OpenStack.
Would you or anyone else here be able to try reproducing this issue?
Time is the limiting factor currently. Can you create a PR with your changes, maybe we can simply build an image for you? Would that help?
Hi, I wrote down some of the changes in the docs - there are also a few changes in the kickstart file for rocky 9 and I am still trying to figure out how to solve the ansible connection error in the second build step (rockylinux-9-x-86_64/bwcloud-...) https://github.com/usegalaxy-eu/operations/blob/main/cloud/vgcn.md
And regarding to the file size, did you complete the second build step with all the ansible roles? I would guess they are adding a big share of the size
This is where it currently stops working for me with Rocky 9:
==> qemu: Using SSH communicator to connect: 127.0.0.1
==> qemu: Waiting for SSH to become available...
==> qemu: Connected to SSH!
==> qemu: Provisioning with Ansible...
qemu: Setting up proxy adapter for Ansible....
==> qemu: Executing Ansible: ansible-playbook -e packer_build_name="qemu" -e packer_builder_type=qemu -e packer_http_addr=10.0.2.2:0 --ssh-extra-args '-o IdentitiesOnly=yes' -e ansible_ssh_private_key_file=/tmp/ansible-key3056215714 -i /tmp/packer-provisioner-ansible1337522397 /home/mira/repos/vgcn/ansible-roles/setup-vgcn-bwcloud.yml
qemu: [DEPRECATION WARNING]: "include" is deprecated, use include_tasks/import_tasks
qemu: instead. See https://docs.ansible.com/ansible-
qemu: core/2.14/user_guide/playbooks_reuse_includes.html for details. This feature
qemu: will be removed in version 2.16. Deprecation warnings can be disabled by
qemu: setting deprecation_warnings=False in ansible.cfg.
qemu:
qemu: PLAY [default] *****************************************************************
qemu:
qemu: TASK [Gathering Facts] *********************************************************
==> qemu: failed to handshake
qemu: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 33237: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}
qemu:
qemu: PLAY RECAP *********************************************************************
qemu: default : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
qemu:
==> qemu: Provisioning step had errors: Running the cleanup provisioner, if present...
==> qemu: Deleting output directory...
Build 'qemu' errored after 16 seconds 121 milliseconds: Error executing Ansible: Non-zero exit status: exit status 4
==> Wait completed after 16 seconds 121 milliseconds
==> Some builds didn't complete successfully and had errors:
--> qemu: Error executing Ansible: Non-zero exit status: exit status 4
==> Builds finished but no artifacts were created.
make: *** [Makefile:66: rockylinux-9.x-x86_64/vgcn-bwcloud] Error 1
Hello,
I am trying to build a Rocky Linux VGCN image on Ubuntu using Packer v. 1.8.5 (installed via apt) and QEMU built from the latest sources.
TL;DR: the main issue is at the end, but I'm also reporting a couple of minor hurdles I experienced on the way.
When running make first time after cloning this repository, Packer complains about the JSON template. It also suggests how to fix it:
After fixing the JSON template, it looks like the Rocky Linux ISO location has changed:
The new location is mentioned here: https://ftp.fau.de/rockylinux/8.6/README.txt
After fixing the ISO location in the JSON template, the build process manages to start and complete, but it doesn't seem to finish successfully. This is the output:
Attempting to launch the image in QEMU fails, with messages like
Boot failed: not a bootable disk
andNo bootable device
.In order to get debug information, I've set
export PACKER_LOG=1
and this was the (end of the) build process' output: