psy0rz / zfs_autobackup

ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. Easy to use and very reliable.
https://github.com/psy0rz/zfs_autobackup
GNU General Public License v3.0
583 stars 62 forks source link

zfs_autobackup and TrueNAS compatibility issues with mounted datasets #254

Open M4stakilla opened 3 months ago

M4stakilla commented 3 months ago

Hi,

I'm using your script on TrueNAS Scale Dragonfish and found 2 issues when replicating to a mounted dataset.

After running the script with --debug and trying all commands one by one manually, I was able to figure out the missing commands to make things work properly. This is the zfs_autobackup command I used for my replication: root@truenas-backup:~# autobackup-venv/bin/python -m zfs_autobackup.ZfsAutobackup --verbose --debug --ssh-config ../../../.ssh/config --ssh-target truenas-master --rollback --keep-source=0 --keep-target=0 --allow-empty --snapshot-format {}-%Y-%m-%d_%H-%M --zfs-compressed --decrypt --clear-refreservation --strip-path 2 --exclude-received --other-snapshots test master-pool/encrypted-ds

Below a summary of the individual commands that are required to make it work on TrueNAS (I skipped all the snapshot, hold, release commands during my testing, as they are not relevant for these issues):

1. (zfs send -L -e -c -v -P backup-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-22) | (ssh -F ../../../.ssh/config truenas-master 'zfs recv -u -x refreservation -v -s master-pool/encrypted-ds/test3-ds')
2. (ssh -F ../../../.ssh/config truenas-master 'zfs mount master-pool/encrypted-ds/test3-ds')
3. (ssh -F ../../../.ssh/config truenas-master 'zfs rollback master-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-22')
4. (zfs send -L -e -c -v -P -i @manual-2024-05-21_10-22 backup-pool/encrypted-ds/test3-ds@manual-2024-05-21_10-35) | (ssh -F ../../../.ssh/config truenas-master 'zfs recv -u -x refreservation -v -s master-pool/encrypted-ds/test3-ds')
5. (ssh -F ../../../.ssh/config truenas-master 'zfs umount master-pool/encrypted-ds/test3-ds')
6. (ssh -F ../../../.ssh/config truenas-master 'zfs mount master-pool/encrypted-ds/test3-ds')
  1. (Done correctly by zfs_autobackup) This is the initial full replication and it works fine
  2. (Done correctly by zfs_autobackup) This is the mount of the target dataset and works well, since my source dataset has canmount=on and I want to actually mount it on the target as well (I'm only replicating datasets in the 3th level, so if I understood your warnings for this well, I shouldn't run into issues)
  3. (Missing in zfs_autobackup!) Apparently TrueNAS modifies something on the dataset after mounting it. So I need to rollback after the mount otherwise I'll get the error "! [Target] STDERR > cannot receive incremental stream: destination master-pool/encrypted-ds/test3-ds has been modified"
  4. (Done correctly by zfs_autobackup) This is the incremental replication of the 2nd snapshot and works well
  5. and 6. (Missing in zfs_autobackup!) When replicating over a mounted dataset in TrueNAS, TrueNAS will throw 3 different kinds of errors randomly, which can be resolved by unmounting and remounting the dataset.

Below the 3 errors that TrueNAS throws when not remounting the dataset:

[EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds

 Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 76, in get_quota
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 529, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 78, in get_quota
    quotas = resource.userspace(quota_props)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libzfs.pyx", line 3651, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for master-pool/encrypted-ds/test-ds: unsupported version or feature

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 112, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 34, in _call
    with Client(f'ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock', py_exceptions=True) as c:
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 80, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/dataset_quota_and_perms.py", line 225, in get_quota
    quota_list = await self.middleware.call(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1425, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1431, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1337, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
middlewared.service_exception.CallError: [EFAULT] Failed retreiving USER quotas for master-pool/encrypted-ds/test-ds

[EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds

 Error: concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 76, in get_quota
    with libzfs.ZFS() as zfs:
  File "libzfs.pyx", line 529, in libzfs.ZFS.__exit__
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 78, in get_quota
    quotas = resource.userspace(quota_props)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libzfs.pyx", line 3651, in libzfs.ZFSResource.userspace
libzfs.ZFSException: cannot get used/quota for master-pool/encrypted-ds/test-ds: dataset is busy

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 112, in main_worker
    res = MIDDLEWARE._run(*call_args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 46, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 34, in _call
    with Client(f'ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock', py_exceptions=True) as c:
  File "/usr/lib/python3/dist-packages/middlewared/worker.py", line 40, in _call
    return methodobj(*params)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs_/dataset_quota.py", line 80, in get_quota
    raise CallError(f'Failed retreiving {quota_type} quotas for {ds}')
middlewared.service_exception.CallError: [EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1417, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 187, in nf
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/pool_/dataset_quota_and_perms.py", line 225, in get_quota
    quota_list = await self.middleware.call(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1564, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1425, in _call
    return await self._call_worker(name, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1431, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1337, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
middlewared.service_exception.CallError: [EFAULT] Failed retreiving GROUP quotas for master-pool/encrypted-ds/test-ds

[ENOENT] Path /mnt/master-pool/encrypted-ds/test-ds not found

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 198, in call_method
    result = await self.middleware.call_with_audit(message['method'], serviceobj, methodobj, params, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1466, in call_with_audit
    result = await self._call(method, serviceobj, methodobj, params, app=app,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1428, in _call
    return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1321, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 191, in nf
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 53, in nf
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem.py", line 422, in stat
    raise CallError(f'Path {_path} not found', errno.ENOENT)
middlewared.service_exception.CallError: [ENOENT] Path /mnt/master-pool/encrypted-ds/test-ds not found
psy0rz commented 3 months ago

For point 3 try zfs-autobackup v3.3 beta. Maybe this also fixes 6, but maybe thats just a truenas issue?

See https://github.com/psy0rz/zfs_autobackup/wiki/Mounting

M4stakilla commented 3 months ago

Hi psy0rz,

Thanks a lot for your response. I do agree it is a bit tricky regarding whose issue this is (TrueNAS or zfs-autobackup). I will try to raise a TrueNAS bugreport for this as well later (but I fear they'll just dismiss it because I'm not supposed to run external scripts to do this on "an appliance").

I actually did do all my latest testing with v3.3 beta 3. zfs-autobackup_debug.log

Attached is the output of my 2 --debug runs out of which I've extracted all manual commands to perform my testing.

'3.' wasn't fixed in this version I'm afraid (see the first run - it only complained that the dataset was modified, but doesn't roll it back). '6.' can only be done if '5.' is done first, so I'm afraid also this isn't resolved.

Thanks a lot for your hard work btw. Much appreciated. I first started scripting this myself in bash, but I could have never done it as good as you've done over the years ;)

I did create a bash wrapper script already that works around issue 5. and 6. Issue 3. I cannot work around, as it happens in the middle of your script (but also only occurs the first initial replication)

Example below:

#!/usr/bin/bash

DIRECTION="master_to_backup"
SCOPE="all_snapshots"

TASK="${DIRECTION}_${SCOPE}"
SSH_CONFIG_FILE="../../../.ssh/config"
SSH_TARGET="truenas-master"
REMOTE_CMD="ssh -F ${SSH_CONFIG_FILE} ${SSH_TARGET}"

ZFS_AUTOBACKUP_FOLDER="zfs_autobackup-latest"
LOG_FILE="../../../logs/backup/zfs_autobackup-${TASK}.$(date +"%Y-%m-%d_%H-%M").log"
ZFS_AUTOBACKUP_COMMAND="autobackup-venv/bin/python -m zfs_autobackup.ZfsAutobackup"

SNAPSHOT_OPTARGS="--rollback --keep-source=0 --keep-target=0 --allow-empty --snapshot-format {}-%Y-%m-%d_%H-%M"
ZFS_OPTARGS="--zfs-compressed --decrypt --clear-refreservation"
ZFS_AUTOBACKUP_OPTARGS="--strip-path 2 --exclude-received"

cd "$(dirname -- "${BASH_SOURCE[0]}")/${ZFS_AUTOBACKUP_FOLDER}"

if [[ "${DIRECTION}" == "backup_to_master" ]]; then
  SSH_OPTARGS="--ssh-config "${SSH_CONFIG_FILE}" --ssh-target ${SSH_TARGET}"

  TARGET_PARENT_DATASET="master-pool/encrypted-ds"
  IMPACTED_DATASETS="$(zfs list -H | awk '{print $1}' | xargs zfs get all | grep " autobackup:${TASK} " | awk '{print $1}' | xargs basename -a 2>/dev/null)"
  UMOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "${REMOTE_CMD} 'zfs umount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET}' ; "; done)"
  MOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "${REMOTE_CMD} 'zfs mount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET}' ; "; done)"
elif [[ "${DIRECTION}" == "master_to_backup" ]]; then
  SSH_OPTARGS="--ssh-config "${SSH_CONFIG_FILE}" --ssh-source ${SSH_TARGET}"

  TARGET_PARENT_DATASET="backup-pool/encrypted-ds"
  IMPACTED_DATASETS="$(${REMOTE_CMD} "zfs list -H | awk '{print \$1}' | xargs zfs get all | grep \" autobackup:${TASK} \" | awk '{print \$1}' | xargs basename -a 2>/dev/null")"
  UMOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "zfs umount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET} ; "; done)"
  MOUNT_CMD="$(for IMPACTED_DATASET in ${IMPACTED_DATASETS}; do echo -n "zfs mount ${TARGET_PARENT_DATASET}/${IMPACTED_DATASET} ; "; done)"
else
  echo "'${DIRECTION}' is not a valid \$DIRECTION!"
  exit 1
fi

if [[ "${SCOPE}" == "all_snapshots" ]]; then
  ZFS_AUTOBACKUP_TASK_OPTARGS="--other-snapshots ${TASK} ${TARGET_PARENT_DATASET}"
elif [[ "${SCOPE}" == "latest_snapshot_only" ]]; then
  ZFS_AUTOBACKUP_TASK_OPTARGS=" ${TASK} ${TARGET_PARENT_DATASET}"
else
  echo "'${SCOPE}' is not a valid \$SCOPE!"
  exit 1
fi

[[ ! -d "$(dirname "${LOG_FILE}")" ]] && mkdir "$(dirname "${LOG_FILE}")"

echo "zfs-autobackup command = ${ZFS_AUTOBACKUP_COMMAND} --verbose ${SSH_OPTARGS} ${SNAPSHOT_OPTARGS} ${ZFS_OPTARGS} ${ZFS_AUTOBACKUP_OPTARGS} ${ZFS_AUTOBACKUP_TASK_OPTARGS}" >"${LOG_FILE}"
echo "umount command = ${UMOUNT_CMD}" >>"${LOG_FILE}"
echo "mount command = ${MOUNT_CMD}" >>"${LOG_FILE}"
echo >>"${LOG_FILE}"

nohup /usr/bin/bash -c \
  "{ ${ZFS_AUTOBACKUP_COMMAND} --verbose ${SSH_OPTARGS} ${SNAPSHOT_OPTARGS} ${ZFS_OPTARGS} ${ZFS_AUTOBACKUP_OPTARGS} ${ZFS_AUTOBACKUP_TASK_OPTARGS} ; ${UMOUNT_CMD} ${MOUNT_CMD} }" \
  >>"${LOG_FILE}" 2>&1 \
  & tail -f "${LOG_FILE}"
psy0rz commented 1 month ago

Try setting canmount=noauto on the target datasets to see if this solves the issue. zfs problaby tries to mount the datasets, and creates a mountpoint directory in a higher dataset. (e.g. https://github.com/psy0rz/zfs_autobackup/wiki/Mounting#target-side)

Also use the --clear-mountpoint option for zfs-autobackup to automatically set canmount=noauto.