Closed EagTG closed 4 months ago
Also, confirming that for this particular workload dropping '--allow-empty' seems to have worked around the issue as not all of our thousands of datasets contain updated data, therefore making the resulting 'zfs snapshot' command shorter (at least I'm assuming so, I missed that in the log, not sure if it still sends the zfs snapshot all-at-once, or if it breaks it up).
To make a consistent snapshot we need to call zfs snapshot with all the datasets at once.
One option would be to create the snapshots yourself with zfs snapshot -r for example and then run zfs-autobackup with --no-snapshot
(And perhaps zfs destroy the snapshots you dont want before calling it)
Great!
Ps same person, different account?
Hahah, yes, my mistake.
Posting again from the proper account.
Thanks psy0rz, that seems to work.
I created a BASH file that generates thousands of datasets in a test ZFS environment and was able to replicate the issue. I then modified my process to call the snapshot directly via ZFS first:
#!/bin/bash
DATEFMT=$(date +%y.%m.%d-%a-%H.00)
echo \=\=\=\> ${DATEFMT}
/usr/bin/ssh username@[server name redacted] "/usr/sbin/zfs snapshot -r poolt0@${DATEFMT}"
And then run the modified zfs-autobackup command (including the --no-snapshot and --allow-empty parameters):
/usr/local/bin/zfs-autobackup -v --debug --no-thinning --clear-mountpoint --no-snapshot \
--strip-path=1 --snapshot-format=%y.%m.%d-%a-%H.00 --compress --allow-empty \
--keep-source=15,1d1w,1w1m,1m1y --keep-target=30,1d2w,1w1m,1m2y \
--buffer=128M --ssh-source=[server name redacted] t0_to_t1 poolt1
(Naturally, removing the -v and --debug from the cron version).
This seems to do exactly what I want, thanks for the suggestion!
I eventually want to enable the Thinner as well, and will be running a few additional tests to confirm the snapshot deletion via the Thinner is doing what I want. I expect it will, as the snapshot naming from the BASH file is consistent with --snapshot-format.
Thanks again!
Hi there,
Love this script so far. I started implementing it on some of our larger ZFS deployments and ran into an issue. I'm certain it's due the thousands of datasets on this particular ZFS environment. I've also found that it seems to relate to the maximum command line argument length in BASH (not a direct issue with zfs-autobackup), so this might be an enhancement request, possibly to ask that zfs-autobackup split the command lines when they are long?
Also would welcome any other work-arounds. I've tried things like increasing the ulimit setting (found via similar issues with other things in BASH), but they haven't worked to fix this issue. I'm considering trying some kludgy alpha-splits just to get around the issue temporarily.
I also feel that I could work around the issue by not using the --allow-empty parameter as that would reduce the number of datasets it's snapshotting dramatically, but I would like to keep all of my snapshot names consistent through all of the datasets.
The environment is Proxmox 7.4.
Command line I used:
Error Received:
The output from zfs-autobackup right before the failure:
The command it's trying to run (at 'long output redacted') is 165,983 bytes. Unfortunately, there is some proprietary information contained in the ZFS dataset names that I'd rather not share here.
Happy to provide additional information on-request. Thanks in advance!
Edit: Forgot to mention, this appears to relate to BASH MAX_ARG_STRLEN. Looks like it's 128 KB by default.