opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.25k stars 725 forks source link

Cannot create a coherent snapshot on KVM via quiescing when Zenarmor is installed #7681

Open deajan opened 1 month ago

deajan commented 1 month ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Running OPNSense on KVM, I cannot create a quiesce snapshot via libvirt:

virsh snapshot-create opnsense.local --disk-only --atomic --quiesce

will have the following error:

error: internal error: unable to execute QEMU agent command 'guest-fsfreeze-freeze': failed to freeze /usr/local/zenarmor/output/active/temp: Resource deadlock avoided

Removing --quiesce from the libvirt command works.

To Reproduce

Steps to reproduce the behavior:

  1. Have OPNSense installed with Zenarmor
  2. Create a quiesce snapshot via KVM

Expected behavior

Quiescing should run all necessary pre snapshot freeze and thaw scripts.

Additional context

I've tried to find the necessary freeze/thaw scripts in OPNsense in order to exclude ramdisks, in our case /usr/local/zenarmor/output/active/temp from quiescing. I've also tried to find the freeze/thaw scripts in order to suspend Zenarmor service until the snapshot is done.

Couldn't find any relevant info in OS.

Running qemu-ga in OPNSense suggests that it will read the freeze/thaw script in /usr/local/bin/../etc/qemu/fsfreeze-hook if found.

I've made the following changes in OPNSense:

In /etc/rc.conf.d/qemu_guest_agent:

- qemu_guest_agent_flags="-d -l /var/log/qemu-ga.log"
+ qemu_guest_agent_flags="-d -l /var/log/qemu-ga.log -F/usr/local/etc/qemu/fsfreeze-hook"

Then I created the following script in /usr/local/etc/qemu/fsfreeze-hook and made it executable:

#!/bin/sh

LOG_FILE=/var/log/qemu-ga.log

# Static device name found in /usr/local/etc/rc.d/eastpect
ZENARMOR_RAMDISK="/dev/md43"
ZENARMOR_RAMDISK_MOUNTPOINT="/usr/local/zenarmor/output/active/temp"

log () {
        echo "$1" >> "${LOG_FILE}";
}

case "$1" in
        "freeze")
                log "Launching freeze operations"
                if [ -d "${ZENARMOR_RAMDISK_MOUNTPOINT}" ]; then
                       log "Zenarmor installed, Stopping engine"
                        zenarmorctl engine stop >> "${LOG_FILE}" 2>&1
                        umount "${ZENARMOR_RAMDISK_MOUNTPOINT}" >> "${LOG_FILE}" 2>&1
                       sleep 1
                fi
                # Return 0 regardless of state, since a pre-stopped engine might return a false code
                log "Freeze operation done"
                exit 0
                ;;
        "thaw")
                log "Launching thaw operations"
                if [ -d "${ZENARMOR_RAMDISK_MOUNTPOINT}" ]; then
                       log "Zenarmor installed, starting engine"
                        mount "${ZENARMOR_RAMDISK}" "${ZENARMOR_RAMDISK_MOUNTPOINT}" >> "${LOG_FILE}" 2>&1
                        zenarmorctl engine start  >> "${LOG_FILE}" 2>&1
                       sleep 1
                fi
                log "Thaw operation done"
                exit 0
                ;;
        *)
                log "No options given. Nothing will happen. Options are 'freeze' or 'thaw'"
                exit 1
                ;;
esac

So far so good, I can now use --quiesce to make my snapshots application aware. I am more than willing to make a PR for this issue, if @fichtner or @AdSchellevis could have a quick look just to make sure I didn't commit any errors, especially since I don't know if /etc/rc.conf.d/qemu_guest_agent is generated on boot.

Also, should I make this PR for qemu_guest_agent plugin instead of core ?

Thanks ;)

Relevant forum entry: https://forum.opnsense.org/index.php?topic=38943.0

Environment

I've tried this with all OPNsense versions from 22.7 up to recent 24.7_5, on multiple hosts, all with KVM.

deajan commented 1 month ago

Discovered an issue with qemu-ga, see https://github.com/opnsense/plugins/issues/4148 So this is now in standby mode.