Open IanMoroney opened 4 weeks ago
@zhaofengli fyi
Same here, these are some logging messages
$ kubectl logs -n test-arc-repo-runners -c guest-console-log virt-launcher-runner-tgmlp -f
[ 1.546059] sgx: There are zero EPC sections.
<<< NixOS Stage 1 >>>
loading module virtio_balloon...
loading module virtio_console...
loading module virtio_rng...
loading module dm_mod...
running udev...
Starting systemd-udevd version 254.3
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
starting device mapper and LVM...
checking /dev/disk/by-label/nixos...
fsck (busybox 1.36.1)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos
nixos: clean, 114161/509040 files, 435699/2034432 blocks
mounting /dev/disk/by-label/nixos on /...
<<< NixOS Stage 2 >>>
running activation script...
setting up /etc...
starting systemd...
Welcome to NixOS 23.11 (Tapir)!
[ OK ] Created slice Slice /system/getty.
[ OK ] Created slice Slice /system/modprobe.
[ OK ] Created slice Slice /system/serial-getty.
[ OK ] Created slice User and Session Slice.
...
<<< Welcome to NixOS 23.11.20231117.c757e9b (x86_64) - ttyS0 >>>
Run 'nixos-help' for the NixOS manual.
runner login: root (automatic login)
[root@runner:~]# Stopping Session 1 of User root...
Stopping Session 2 of User root...
[ OK ] Removed slice Slice /system/modprobe.
[ OK ] Stopped target Multi-User System.
[ OK ] Stopped target Login Prompts.
[ OK ] Stopped target Containers.
[ OK ] Stopped target Host and Network Name Lookups.
[ OK ] Stopped target Timer Units.
...
Unmounting /run/keys...
Unmounting run-wrappers.mount...
Unmounting /runner-info...
[ OK ] Stopped Grow Root File System.
[ OK ] Stopped growpart.service.
[ OK ] Unmounted /run/keys.
[ OK ] Unmounted run-wrappers.mount.
[ OK ] Unmounted /runner-info.
[ OK ] Stopped target Preparation for Local File Systems.
[ OK ] Stopped target Swaps.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Create Static Device Nodes in /dev gracefully.
[ OK ] Reached target System Shutdown.
[ OK ] Reached target Late Shutdown Services.
[ OK ] Finished System Power Off.
[ OK ] Reached target System Power Off.
[ 20.854376] reboot: Power down
Apparently the StartPre Service requires the legacy application and the runner-info
only contains jitconfig
information, the solution could be to use the just-in-time syntax. I have used the following template to pass the tests:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-template
spec:
runStrategy: Manual
template:
metadata:
name: runner
spec:
architecture: amd64
terminationGracePeriodSeconds: 30
domain:
devices:
filesystems:
- name: runner-info
virtiofs: {}
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
cpu:
cores: 3
resources:
requests:
memory: 14Gi
networks:
- name: default
pod: {}
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/fedora:latest
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
users:
- name: runner
homedir: /home/runner
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
mounts:
- [ runner-info, /runner-info/, virtiofs, "rw,relatime,user=fedora" ]
packages:
- jq
bootcmd:
- "sudo mkdir /opt/runner"
- "curl -sL https://github.com/actions/runner/releases/download/v2.320.0/actions-runner-linux-x64-2.320.0.tar.gz | sudo tar -xz -C /opt/runner"
- "sudo /opt/runner/bin/installdependencies.sh"
runcmd:
- "sudo chown -R runner: /opt/runner"
- "sudo runuser -l runner -c '/opt/runner/run.sh --jitconfig $(jq -r '.jitconfig' /runner-info/runner-info.json)'"
- "sudo poweroff"
According to the runner-set, it starts the launcher-runner, which i can watch it start up (kubevirt vnc), i see that it automatically logs in as root, waits for a few seconds and then shuts down again.
The runner never goes online in the github runner list, and the workflow job never starts.
These are the output logs of the
compute
container from the kubevirt launcher-runner pod:The logs seem to indicate
event id 6 reason 1 received
.The VM in use by the template is using
image: ghcr.io/zhaofengli/sample-vm-container-disk:latest
.Everything else seems to be doing as it should: The listener detects a job that needs a runner The listener creates the runner-set The runner-set creates the launcher-runner but the agent never goes online with github, and the vm just seems to start, log in as root, and then does nothing.
Is the sample-vm-container-disk somehow incomplete? I'm not sure what i'm missing.