reefland / ansible-zfs_on_root

Ansible ZFS on Root with device based zpool rules. For bare metal installs. ZFS Encryption, UEFI and Dropbear included.
7 stars 2 forks source link

ZFS on Root For Ubuntu 22.04 LTS

This Ansible role is my standardized ZFS on Root installation that I use as a base for all my systems. Additional roles are applied on top of this to make the generic host a specialized Home Theater PC, Full Graphical Desktop, Kubernetes Cluster node, a headless Docker Server, etc...

NOTE: This Ansible role is not structured as rigid as a typical Ansible role should be. Tips and suggestions on how to improve this are welcomed.


Originally based on the OpenZFS ZFS on Root Guide, but no longer!! Now with many enhancements:


TL;DR


Environments Tested


Requirements

Caution


What is rEFInd and ZFSbootMenu?

rEFInd is a graphical boot manager replacement used instead of GRUB. GRUB is not ZFS aware.

rEFInd Boot Menu Image

ZFSbootManager allows you to select which "boot environment" to use, this can be a previous ZFS Snapshot. See Project Home Page for details.

ZFS Boot Menu Image


WHY use THIS instead of Ubuntu's Installation Wizard

My intention for this to have a standardized and repeatable base install for my occasional one off builds. However being based on Ansible this can be used to make batches of servers or desktops needing ZFS on Root installations.

Configurable Rules

This provides a configurable way to define how the ZFS installation will look like and allows for topologies that cannot be defined within the standard Ubuntu installation wizard.

Optional ZFS Native Encryption

ZFS Native Encryption (aes-256-gcm) can be enabled for the root pool. If create swap partition option is enabled, then the swap partition will also be encrypted. From the OpenZFS overview:

SSHD Configuration Settings

Some of the SSHD configuration options can be defined to help lock down your base server image. For example, Password Authentication will be disabled, Pubkey Authentication will be enabled, Root login will only be permitted via PubKey Authentication.

Dropbear Support

When a computer is rebooted with ZFS native encryption enabled then someone needs to be at the console to enter the passphrase for the root pool encryption.


How do I set it up

Edit your inventory document

I use a yaml format inventory file, you will have to adjust to whatever format you use.


---
###[ Define all Hosts ]########################################################
all:
  hosts:
    ...

  children:
    ###[ ZFS on Root Installs ]################################################
    zfs_on_root_install_group:
      hosts:
        testlinux01.localdomain:
          host_name: "testlinux01"
          disk_devices: ["sda", "sdb", "sdc"]

        testlinux02.localdomain:
          host_name: "testlinux02"
          disk_devices: ["sda", "nvme0n1"]

        testlinux03.localdomain:
          host_name: "testlinux03"
          disk_devices: ["sda"]
          root_partition_size: "120G"

      vars:
        # Define the default domain these hosts will use
        domain_name: "localdomain"

        # Define URL for Apt-Cacher-NG Package Cache
        # apt_http_proxy: "http://192.168.0.243:3142"
        # apt_https_proxy: false

Inventory / Host Variables

All of these are optional, if not provided you will be prompted to enter values if needed.


Edit defaults/main.yml to define the defaults

The defaults/main.yml contains most setting that can be changed.

You are defining reasonable defaults. Individual hosts that need something a little different can be set in the inventory file or you can use any other method that Ansible support for defining variables.

Disable Logging Secrets

Ansible tasks that reference secrets (password or passphrase) may show them on the screen as well as settings summary screens. Showing passwords can be enabled if that would help troubleshooting.

# Don't log secret like ZFS on Root Password when running playbook
no_log_secrets: true

Define Temporary Root Password

This temporary root password is only used during the build process. The ability for root to use a password will be disabled towards the final stages.

###############################################################################
# User Settings
###############################################################################

# This is a temporary root password used during the build process.  
# Root password will be disabled during the final stages.
# The non-root user account will have sudo privileges
default_root_password: "change!me"

Define the Non-Root Account(s)

Define your standard privileged account(s). The root account password will be disabled at the end, the privileged account(s) defined here must have sudo privilege to perform root activities. You will be forced to change this password upon first login. (Additional accounts can be defined).

NOTE: The ~/.ssh/authorized_keys for the 1st user will be allowed to connect to Dropbear (if enabled)

# Define non-root user account(s) to create (home drives will be its own dataset)
# Each user will be required to change password upon first login
regular_user_accounts: 
  - user_id: "rich"
    password: "change!me"
    full_name: "Richard Durso"
    groups: "adm,cdrom,dip,lpadmin,lxd,plugdev,sambashare,sudo"
    shell: "/bin/bash"

Additional Settings to Review

Additional Configuration Files

There should be no reason to alter the configuration file vars/main.yml which defines all the details and flags to construct partitions, root pools, and how all the dataset will be created. If this type of information interests you, this is where you will find it... but don't change anything unless you understand what you are looking at.


How do I Run It

Prepare the Install Environment

  1. Boot the Ubuntu Live CD:

    • Select option .
    • Connect your system to the internet as appropriate (e.g. join your Wi-Fi network).
    • Open a terminal within the Live CD environment - press Ctrl Alt-T.
  2. Install and start the OpenSSH server in the Live CD environment (see helper script below):

Fetch Helper Script

The helper script will perform many steps for you such as update packages, create an ansible user account, define a password for that account, grant the ansible account sudo privileges, install SSH server, python, etc.

Option 1 - Proper Way to Run Helper Script
wget https://raw.githubusercontent.com/reefland/ansible-zfs_on_root/master/files/do_ssh.sh

chmod +x do_ssh.sh

./do_ssh.sh
Option 2 - Lazy Way to Run Helper Script
wget -O - https://bit.ly/do_ssh | bash

sudo passwd ansible

The Live CD Install Environment is now ready. Nothing else you need to do here. The rest is done from the Ansible Control node.

If Helper Script is not Available

Push your Ansible Public Key to the Install Environment

From the Ansible Control Node push your ansible public key to the Install Environment. You will be prompted for the ansible password create within Ubuntu Live CD Install Environment:

ssh-copy-id -o "UserKnownHostsFile=/dev/null" -o "StrictHostKeyChecking=no" -i ~/.ssh/ansible.pub ansible@<remote_host_name>

# Expected output:
ansible@<remote_host_name> password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ansible@<remote_host_name>'"
and check to make sure that only the key(s) you wanted were added.

Optionally, you can test connectivity easily to verify SSH has been configured correctly.

ansible -i inventory.yml -m ping <remote_host_name>

# Expect output to include:

remote_host_name | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

You are now ready to perform a ZFS on Root installation to this target machine.

Fire-up the Ansible Playbook

The most basic way to run the entire ZFS on Root process and limit to an individual host as specified:

ansible-playbook -i inventory.yml ./zfs_on_root.yml -l <remote_host_name>

After a few minutes, if all goes well you will have a reasonably decent standardized configuring to be a base system ready to be used and modified for any other specific role.

The zfs_on_root.yml is a simple yaml file used to call the role, which can look like this:

---
- name: ZFS on Root Ubuntu Installation
  hosts: zfs_on_root_install
  become: true
  gather_facts: true

  roles:
    - role: zfs_on_root

The first thing I do once this Playbook completes is apply the Customized Message of the Day Ansible Playbook for a login screen with a bit of bling.


Alternative Step by Step Installation

As an alternative to running the entire playbook at one time, it can be run sections at a time using the ansible tags as defined below. This method can be used to troubleshoot issues and replay steps if you have a way of rolling back previous failures. Failures can be rolled back either manually or via snapshots in Virtualbox or equivalent.

To run just one step via tags, all the Ansible Playbook examples can be used with the addition of including --tags:

ansible-playbook -i inventory ./zfs_on_root.yml --extra-vars='{disk_devices: [sda, sdb], host_name: testlinux}' -l <remote_host_name> --tags="install-zfs-packages"

Multiple tags can be combined to run several tasks:

--tags="create_pools, create_filesystems, create_datasets"

This is the list and order of execution for all tags defined for this playbook:

    tags:
      - install-zfs-packages
      - clear_partition_tables_from_devices
      - create_partitions
      - create_pools
      - create_filesystems
      - create_datasets
      - config_system
      - install_zfs
      - config_boot_fs
      - install_dracut
      - install_refind
      - install_syslinux
      - install_zfsbootmenu
      - config_swap [not tested]
      - system_tweaks
      - first_boot_prep
      - fix_mount_order
      - unmount_chroot
      - reboot_remote
      - create_regular_users
      - copy_ssh_keys_notice
      - install_dropbear
      - final_setup
      - restart_remote_final

Helper tasks, basic sanity checks and mandatory tasks are already marked as always and will always be processed to setup the base ansible working environment reading configuration files, setting variables, etc... nothing special you need to do.

Grouping Tags

A reasonable way to build a system in stages using a group of tags instead of calling them individually:

--tags="install-zfs-packages, clear_partition_tables_from_devices, create_partitions, create_pools"
--tags="create_filesystems, create_datasets, config_system, install_zfs"
--tags="config_boot_fs, install_dracut, install_refind, install_syslinux"
--tags="install_zfsbootmenu, config_swap, system_tweaks, first_boot_prep"
--tags="fix_mount_order, unmount_chroot, reboot_remote"
--tags="create_regular_users, copy_ssh_keys_notice, install_dropbear, final_setup, restart_remote_final"

Skipping Tags

Specific tags can also be skipped. For example, if you do not wish to see the manual confirmation page each time; and would rather the playbook just execute directly. Then use:

--skip-tags="confirm_defaults"

Known Issues


More about Root Pool using Mirrored vdev Process

This topology is not available from Ubuntu installer and is not covered by the OpenZFS HowTo method.

Here is a brief overview with additional information.


Helper Scripts


Emergency chroot Recovery

If your system is unable to boot, then boot from the Ubuntu Live CD to create a chroot environment where you can decrypt and mount your ZFS pools, mount boot partitions and have an interactive shell to inspect, troubleshoot, apply updates, etc. You should be comfortable with Emergency chroot Recovery process.


Marking Swap Device as Failed

NOTE: mdadm is used to create mirrored or striped swap partitions. If you will be replacing a drive then you should mark the device as failed before removing it from the system. Failure to do so will likely result in no swap being available. Marking the device as failed before removal allows the swap device to function even if in a degraded state.