Open Adam-D-Lewis opened 3 years ago
I'm working on this in the spawnviewer
branch. I'm getting the following currently. I added libvirt, kvm, and qemu to the shell.nix file. Not sure what's going wrong.
Bringing machine 'hpc01-test' up with 'libvirt' provider...
Bringing machine 'hpc02-test' up with 'libvirt' provider...
Bringing machine 'hpc03-test' up with 'libvirt' provider...
==> hpc03-test: An error occurred. The error will be shown after all tasks complete.
==> hpc01-test: An error occurred. The error will be shown after all tasks complete.
==> hpc02-test: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'hpc01-test'
machine. Please handle this error then try again:
Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
An error occurred while executing the action on the 'hpc02-test'
machine. Please handle this error then try again:
Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
An error occurred while executing the action on the 'hpc03-test'
machine. Please handle this error then try again:
Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
Error: Process completed with exit code 1.
Next steps may be to try to run libvirt manually on an ec2 instance.
Or possibly try a metal machine if nested virtualization is not supported on non-metal aws ec2 instances.
AWS doesn't have great documentation on nested virualization, but after extensive searching it appears it is only available on bare-metal ec2 instances (https://github.com/aws-samples/aws-bare-metal-kvm-demo). The cheapest metal machine that might be suitable is m5zn.metal at $0.8113 per hour for a spot instance currently.
Digital Ocean doesn't recommend nested virtualization (https://www.digitalocean.com/community/questions/does-digitalocean-support-kvm-or-nested-virtulzation)
GCP does allow nested virtualization with some setup (https://cloud.google.com/compute/docs/instances/nested-virtualization/overview)
There are reports of unofficial support for nested virualization with KVM on Azure. (https://www.brianlinkletter.com/2018/06/create-a-nested-virtual-machine-in-a-microsoft-azure-linux-vm/ or https://blog.nillsf.com/index.php/2020/03/24/creating-nested-vm-using-kvm-on-azure/)
I'm trying to get libvirt using KVM running on Google Compute Engine. I followed this tutorial but was unsuccessful: https://joachim8675309.medium.com/devops-box-vagrant-with-kvm-d7344e79322c.
I recently went to do some work on this repo and found 2 unrelated issues that needed to be debugged and solved first. We should set up an automated test to run at least weekly to catch these issues as they occur for easier debugging, and also as part of good development practice.
As a first pass, I'd propose just deploying the Vagrant vms which are in the tests folder successfully as the test. We'll likely need to use CIrun in order to have a machine large enough to deploy the vms (requires 16 gb of ram just for vm's, though we could likely lower that a bit)