nebari-dev / nebari-slurm

An opinionated open source deployment of jupyterhub based on an Slurm job scheduler.
BSD 3-Clause "New" or "Revised" License
28 stars 10 forks source link

Add automated tests for this repo #80

Open Adam-D-Lewis opened 3 years ago

Adam-D-Lewis commented 3 years ago

I recently went to do some work on this repo and found 2 unrelated issues that needed to be debugged and solved first. We should set up an automated test to run at least weekly to catch these issues as they occur for easier debugging, and also as part of good development practice.

As a first pass, I'd propose just deploying the Vagrant vms which are in the tests folder successfully as the test. We'll likely need to use CIrun in order to have a machine large enough to deploy the vms (requires 16 gb of ram just for vm's, though we could likely lower that a bit)

Adam-D-Lewis commented 3 years ago

I'm working on this in the spawnviewer branch. I'm getting the following currently. I added libvirt, kvm, and qemu to the shell.nix file. Not sure what's going wrong.

Bringing machine 'hpc01-test' up with 'libvirt' provider...
Bringing machine 'hpc02-test' up with 'libvirt' provider...
Bringing machine 'hpc03-test' up with 'libvirt' provider...
==> hpc03-test: An error occurred. The error will be shown after all tasks complete.
==> hpc01-test: An error occurred. The error will be shown after all tasks complete.
==> hpc02-test: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'hpc01-test'
machine. Please handle this error then try again:

Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

An error occurred while executing the action on the 'hpc02-test'
machine. Please handle this error then try again:

Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

An error occurred while executing the action on the 'hpc03-test'
machine. Please handle this error then try again:

Error while connecting to libvirt: Error making a connection to libvirt URI qemu:///system?no_verify=1&keyfile=/home/runnerx/.ssh/id_rsa:
Call to virConnectOpen failed: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory
Error: Process completed with exit code 1.

Next steps may be to try to run libvirt manually on an ec2 instance.

Adam-D-Lewis commented 3 years ago

Or possibly try a metal machine if nested virtualization is not supported on non-metal aws ec2 instances.

Adam-D-Lewis commented 3 years ago

AWS doesn't have great documentation on nested virualization, but after extensive searching it appears it is only available on bare-metal ec2 instances (https://github.com/aws-samples/aws-bare-metal-kvm-demo). The cheapest metal machine that might be suitable is m5zn.metal at $0.8113 per hour for a spot instance currently.

Digital Ocean doesn't recommend nested virtualization (https://www.digitalocean.com/community/questions/does-digitalocean-support-kvm-or-nested-virtulzation)

GCP does allow nested virtualization with some setup (https://cloud.google.com/compute/docs/instances/nested-virtualization/overview)

There are reports of unofficial support for nested virualization with KVM on Azure. (https://www.brianlinkletter.com/2018/06/create-a-nested-virtual-machine-in-a-microsoft-azure-linux-vm/ or https://blog.nillsf.com/index.php/2020/03/24/creating-nested-vm-using-kvm-on-azure/)

Adam-D-Lewis commented 3 years ago

I'm trying to get libvirt using KVM running on Google Compute Engine. I followed this tutorial but was unsuccessful: https://joachim8675309.medium.com/devops-box-vagrant-with-kvm-d7344e79322c.