hashicorp / nomad-driver-virt

Mozilla Public License 2.0
8 stars 1 forks source link

Nomad Virt Driver

The virt driver task plugin expands the types of workloads Nomad can run to add virtual machines. Leveraging on the power of Libvirt, the Virt driver allows the user to define virtual machine tasks using the Nomad job spec.

IMPORTANT: This plugin is in tech preview, still under active development, there might be breaking changes in future releases

: This is an Alpha version still under development

Features

Ubuntu Example job

Here is a simple Python server on Ubuntu example:

job "python-server" {

  group "virt-group" {
    count = 1

    network {
      mode = "host"
      port "http" {
        to = 8000
      }
    }

    task "virt-task" {

      driver = "nomad-driver-virt"

      artifact {
        source      = "http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img"
        destination = "local/focal-server-cloudimg-amd64.img"
        mode        = "file"
      }

      config {
        image                 = "local/focal-server-cloudimg-amd64.img"
        primary_disk_size     = 10000
        use_thin_copy         = true
        default_user_password = "password"
        cmds                  = ["python3 -m http.server 8000"]

        network_interface {
          bridge {
            name  = "virbr0"
            ports = ["http"]
          }
        }
      }

      resources {
        cores  = 2
        memory = 4000
      }
    }
  }
}
$ nomad job run examples/python.nomad.hcl

==> 2024-09-10T13:01:22+02:00: Monitoring evaluation "c0424142"
    2024-09-10T13:01:22+02:00: Evaluation triggered by job "python-server"
    2024-09-10T13:01:23+02:00: Evaluation within deployment: "d546f16e"
    2024-09-10T13:01:23+02:00: Allocation "db146826" created: node "c20ee15a", group "virt-group"
    2024-09-10T13:01:23+02:00: Evaluation status changed: "pending" -> "complete"
==> 2024-09-10T13:01:23+02:00: Evaluation "c0424142" finished with status "complete"

$ virsh list

 Id   Name                 State
------------------------------------
 4    virt-task-5a6e215e   running

Building The Driver from source

In order to be able to build the binary, the libvirt-dev module is necessary, use any of the package managers to get it:

sudo apt install libvirt-dev
git clone git@github.com:hashicorp/nomad-driver-virt
cd nomad-driver-virt
make dev

The compiled binary will be located at ./build/nomad-driver-virt.

Runtime dependencies

Make sure the node where the client will run supports virtualization, in Linux you can do it in a couple of ways:

  1. Reading the CPU flags:

    egrep -o '(vmx|svm)' /proc/cpuinfo
  2. Reading the kernel modules and looking for the virtualization ones:

    lsmod | grep -E '(kvm_intel|kvm_amd)'

If the result is empty for either call, the machine does not support virtualization and the nomad client wont be able to run any virtualization workload.

Nomad runs as root, add the user root and the group root to the QEMU configuration to allow it to execute the workloads. Remember to start the libvirtd daemon if not started yet or to restarted after adding the qemu user/group configuration:

systemctl start libvirtd

or

systemctl restart libvirtd

Ensure that Nomad can find the plugin, see plugin_dir

Driver Configuration

plugin "nomad-driver-virt" {
  config {
    emulator {
      uri = "qemu:///default"
    }
    data_dir    = "/opt/ubuntu/virt_temp"
    image_paths = ["/var/local/statics/images/"]
  }
}

Task Configuration

Regarding the resources, currently the driver has support for cpuSets or cores and memory. Every core will be treated as a vcpu. Do not use resources.cpus, they will be ignored.

driver = "nomad-driver-virt"

artifact {
  source      = "http://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img"
  destination = "local/focal-server-cloudimg-amd64.img"
  mode        = "file"
}

config {
  image                           = "local/focal-server-cloudimg-amd64.img"
  primary_disk_size               = 9000
  use_thin_copy                   = true
  default_user_password           = "password"
  cmds                            = ["touch /home/ubuntu/file.txt"]
  default_user_authorized_ssh_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC31v1..."
}

Network Configuration

The following configuration options are available within the task's driver configuration block:

The example below shows the network configuration and task configuration required to expose and map ports 22 and 80:

group "virt-group" {

  network {
    mode = "host"
    port "ssh" {
      to = 22
    }
    port "http" {
      to = 80
    }
  }

  task "virt-task" {
    driver = "nomad-driver-virt"
    config {
      network_interface {
        bridge {
          name  = "virbr0"
          ports = ["ssh", "http"]
        }
      }
    }
  }
}

Exposed ports and services can make use of the existing service block, so that registrations can be performed using the specified backend provider.

Local Development

Make sure the node supports virtualization.

# Build the task driver plugin
make dev

# Copy the build nomad-driver-plugin executable to the plugin dir
cp ./build/nomad-driver-virt - /opt/nomad/plugins

# Start Nomad
nomad agent -config=examples/server.nomad.hcl 2>&1 > server.log &

# Run the client as sudo
sudo nomad agent -config=examples/client.nomad.hcl 2>&1 > client.log &

# Run a job
nomad job run examples/job.nomad.hcl

# Verify
nomad job status virt-example

virsh list

Debugging a VM

Sometimes things dont go as plan and extra tools are necessary to find the problem. Here are some strategies to debug a failing VM:

Connecting to a VM

By default, cloud images are password protected, by adding a default_user_password a new password is assigned to the default user of the used distribution (for example, ubuntu for ubuntu fedora for fedora, or root for alpine) By running virsh console [vm-name], a terminal is started inside the VM that will allow an internal inspection of the VM.

$ virsh list
 Id   Name                 State
------------------------------------
 1    virt-task-8bc0a63f   running

$ virsh console virt-task-8bc0a63f 
Connected to domain 'virt-task-8bc0a63f'
Escape character is ^] (Ctrl + ])

nomad-virt-task-8bc0a63f login: ubuntu
Password:

If no login prompt shows up, it can mean the virtual machine is not booting and adding some extra space to the disk may solve the problem. Remember the disk has to fit the root image plus any other process running in the VM.

The virt driver heavily relies on cloud-init to execute the virtual machine's configuration. Once you have managed to connect to the terminal, the results of cloud init can be found in two different places:

Looking into these files can give a better understanding of any possible execution errors.

If connecting to the terminal is not an option, it is possible to stop the job and mount the VM's disk to inspect it. If the use_thin_image option is used, the driver will create the disk image in the directory ${plugin_config.data_dir}/virt/vm-name.img:

# Find the virtual machine disk image
$ ls /var/lib/virt
virt-task-8bc0a63f.img

# Enable Network Block Devices on the Host
modprobe nbd max_part=8

# Connect the disk as network block device
qemu-nbd --connect=/dev/nbd0 '/var/lib/virt/virt-task-dc8187e3.img'

# Find The Virtual Machine Partitions
fdisk /dev/nbd0 -l

# Mount the partition from the VM
mount /dev/nbd0p1 /mnt/somepoint/

Important Don't forget to unmount the disk after finishing:

umount /mnt/somepoint/
qemu-nbd --disconnect /dev/nbd0
rmmod nbd

Networking

For networking, the plugin leverages on the libvirt default network default:

$ virsh net-list
 Name      State    Autostart   Persistent
--------------------------------------------
 default   active   yes         yes

Under the hood, libvirt uses dnsmasq to lease IP addresses to the virtual machines, there are mutiple ways to find the IP assigned to the nomad task. Using virsh to find the leased IP:

$ virsh net-dhcp-leases default
 Expiry Time           MAC address         Protocol   IP address           Hostname                   Client ID or DUID
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 2024-10-07 18:48:09   52:54:00:b5:0b:d4   ipv4       192.168.122.211/24   nomad-virt-task-dc8187e3   ff:08:24:45:0e:00:02:00:00:ab:11:63:3c:26:5b:b7:fe:b3:13

or using the mac address to find the IP via ARP:

$ virsh dumpxml virt-task-8473ccfb  | grep "mac address" | awk -F\' '{ print $2}'
52:54:00:b5:0b:d4
$ arp -an | grep 52:54:00:b5:0b:d4
? (192.168.122.211) at 52:54:00:b5:0b:d4 [ether] on virbr0