zhaofengli / kubevirt-actions-runner

Ephemeral VM runners with Actions Runner Controller
Other
10 stars 0 forks source link

kubevirt vm starts, does nothing, then stops #4

Open IanMoroney opened 4 weeks ago

IanMoroney commented 4 weeks ago

According to the runner-set, it starts the launcher-runner, which i can watch it start up (kubevirt vnc), i see that it automatically logs in as root, waits for a few seconds and then shuts down again.

The runner never goes online in the github runner list, and the workflow job never starts.

These are the output logs of the compute container from the kubevirt launcher-runner pod:

{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"resumed\" detail=\"unpaused\" with event id 4 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:38:44.742760Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"started\" detail=\"booted\" with event id 2 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:38:44.745683Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain started.","name":"runner","namespace":"kvrunner","pos":"manager.go:1250","timestamp":"2024-10-31T15:38:44.748068Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:44.751023Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:38:44.751189Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:38:44.752677Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:38:44.755771Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:38:44.757298Z"}
{"component":"virt-launcher","level":"info","msg":"Found PID for kvrunner_runner: 79","pos":"monitor.go:170","timestamp":"2024-10-31T15:38:45.101022Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:45.326137Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:45.347497Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"shutdown\" detail=\"unknown\" with event id 6 reason 1 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.614443Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: ShuttingDown(4):Unknown(0)","pos":"client.go:297","timestamp":"2024-10-31T15:39:10.616872Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.618667Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:39:10.623713Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 79 with status 0","pos":"virt-launcher-monitor.go:198","timestamp":"2024-10-31T15:39:10.775544Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"stopped\" detail=\"shutdown\" with event id 5 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.842640Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Shutdown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:39:10.844993Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.846500Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain undefined.","name":"runner","namespace":"kvrunner","pos":"manager.go:1874","timestamp":"2024-10-31T15:39:10.883760Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"undefined\" detail=\"removed\" with event id 1 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.883852Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Signaled vmi deletion","name":"runner","namespace":"kvrunner","pos":"server.go:363","timestamp":"2024-10-31T15:39:10.883850Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: ","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.884890Z"}
{"component":"virt-launcher","level":"info","msg":"Received signal terminated","pos":"virt-launcher.go:473","timestamp":"2024-10-31T15:39:10.974539Z"}
{"component":"virt-launcher","kind":"VirtualMachineInstance","level":"info","msg":"Signaled graceful shutdown","name":"runner","namespace":"kvrunner","pos":"virt-launcher.go:443","timestamp":"2024-10-31T15:39:10.974734Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"Process kvrunner_runner and pid 79 is gone!","pos":"monitor.go:179","timestamp":"2024-10-31T15:39:11.101598Z"}
{"component":"virt-launcher","level":"info","msg":"Waiting on final notifications to be sent to virt-handler.","pos":"virt-launcher.go:281","timestamp":"2024-10-31T15:39:11.101657Z"}
{"component":"virt-launcher","level":"info","msg":"Final Delete notification sent","pos":"virt-launcher.go:296","timestamp":"2024-10-31T15:39:11.101676Z"}
{"component":"virt-launcher","level":"info","msg":"stopping cmd server","pos":"server.go:608","timestamp":"2024-10-31T15:39:11.101749Z"}
{"component":"virt-launcher","level":"info","msg":"cmd server stopped","pos":"server.go:617","timestamp":"2024-10-31T15:39:11.101924Z"}
{"component":"virt-launcher","level":"info","msg":"Exiting...","pos":"virt-launcher.go:512","timestamp":"2024-10-31T15:39:11.101975Z"}
{"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 12 with status 0","pos":"virt-launcher-monitor.go:198","timestamp":"2024-10-31T15:39:11.108537Z"}

The logs seem to indicate event id 6 reason 1 received.

The VM in use by the template is using image: ghcr.io/zhaofengli/sample-vm-container-disk:latest.

Everything else seems to be doing as it should: The listener detects a job that needs a runner The listener creates the runner-set The runner-set creates the launcher-runner but the agent never goes online with github, and the vm just seems to start, log in as root, and then does nothing.

Is the sample-vm-container-disk somehow incomplete? I'm not sure what i'm missing.

IanMoroney commented 4 weeks ago

@zhaofengli fyi

electrocucaracha commented 2 weeks ago

Same here, these are some logging messages

$ kubectl logs -n test-arc-repo-runners -c guest-console-log virt-launcher-runner-tgmlp -f                                                                                                                                                                         
[    1.546059] sgx: There are zero EPC sections.                                                                                                                                                                                                                                              

<<< NixOS Stage 1 >>>                                                                                                                                                                                                                                                                         

loading module virtio_balloon...                                                                                                                                                                                                                                                              
loading module virtio_console...                                                                                                                                                                                                                                                              
loading module virtio_rng...                                                                                                                                                                                                                                                                  
loading module dm_mod...                                                                                                                                                                                                                                                                      
running udev...                                                                                                                                                                                                                                                                               
Starting systemd-udevd version 254.3                                                                                                                                                                                                                                                          
kbd_mode: KDSKBMODE: Inappropriate ioctl for device                                                                                                                                                                                                                                           
starting device mapper and LVM...                                                                                                                                                                                                                                                             
checking /dev/disk/by-label/nixos...                                                                                                                                                                                                                                                          
fsck (busybox 1.36.1)                                                                                                                                                                                                                                                                         
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos                                                                                                                                                                                                                           
nixos: clean, 114161/509040 files, 435699/2034432 blocks                                                                                                                                                                                                                                      
mounting /dev/disk/by-label/nixos on /...                                                                                                                                                                                                                                                     

<<< NixOS Stage 2 >>>                                                                                                                                                                                                                                                                         

running activation script...                                                                                                                                                                                                                                                                  
setting up /etc...                                                     
starting systemd...                                                                                                                                                                                                                                                                           

Welcome to NixOS 23.11 (Tapir)!                                                                                                                                                                                                                                                               

[  OK  ] Created slice Slice /system/getty.                                                                                                                                                                                                                                                   
[  OK  ] Created slice Slice /system/modprobe.                                                                                                                                                                                                                                                
[  OK  ] Created slice Slice /system/serial-getty.                                                                                                                                                                                                                                            
[  OK  ] Created slice User and Session Slice.              
...
<<< Welcome to NixOS 23.11.20231117.c757e9b (x86_64) - ttyS0 >>>

Run 'nixos-help' for the NixOS manual.

runner login: root (automatic login)

[root@runner:~]#          Stopping Session 1 of User root...
         Stopping Session 2 of User root...
[  OK  ] Removed slice Slice /system/modprobe.
[  OK  ] Stopped target Multi-User System.
[  OK  ] Stopped target Login Prompts.
[  OK  ] Stopped target Containers.
[  OK  ] Stopped target Host and Network Name Lookups.
[  OK  ] Stopped target Timer Units.
...
         Unmounting /run/keys...
         Unmounting run-wrappers.mount...
         Unmounting /runner-info...
[  OK  ] Stopped Grow Root File System.
[  OK  ] Stopped growpart.service.
[  OK  ] Unmounted /run/keys.
[  OK  ] Unmounted run-wrappers.mount.
[  OK  ] Unmounted /runner-info.
[  OK  ] Stopped target Preparation for Local File Systems.
[  OK  ] Stopped target Swaps.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create Static Device Nodes in /dev gracefully.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Late Shutdown Services.
[  OK  ] Finished System Power Off.
[  OK  ] Reached target System Power Off.
[   20.854376] reboot: Power down
electrocucaracha commented 2 weeks ago

Apparently the StartPre Service requires the legacy application and the runner-info only contains jitconfig information, the solution could be to use the just-in-time syntax. I have used the following template to pass the tests:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-template
spec:
  runStrategy: Manual
  template:
    metadata:
      name: runner
    spec:
      architecture: amd64
      terminationGracePeriodSeconds: 30
      domain:
        devices:
          filesystems:
            - name: runner-info
              virtiofs: {}
          disks:
            - name: containerdisk
              disk:
                bus: virtio
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
        cpu:
          cores: 3
        resources:
          requests:
            memory: 14Gi
      networks:
        - name: default
          pod: {}
      volumes:
        - name: containerdisk
          containerDisk:
            image: quay.io/containerdisks/fedora:latest
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |-
              #cloud-config
              users:
                - name: runner
                  homedir: /home/runner
                  sudo: ["ALL=(ALL) NOPASSWD:ALL"]
              mounts:
                - [ runner-info, /runner-info/, virtiofs, "rw,relatime,user=fedora" ]
              packages:
                - jq
              bootcmd:
                - "sudo mkdir /opt/runner"
                - "curl -sL https://github.com/actions/runner/releases/download/v2.320.0/actions-runner-linux-x64-2.320.0.tar.gz | sudo tar -xz -C /opt/runner"
                - "sudo /opt/runner/bin/installdependencies.sh"
              runcmd:
                - "sudo chown -R runner: /opt/runner"
                - "sudo runuser -l runner -c '/opt/runner/run.sh --jitconfig $(jq -r '.jitconfig' /runner-info/runner-info.json)'"
                - "sudo poweroff"