tasket / wyng-backup

Fast backups for logical volumes & disk images
GNU General Public License v3.0
248 stars 16 forks source link

Requirements for OpenWRT self-hosting of wyng backup archives #195

Open tlaurion opened 5 months ago

tlaurion commented 5 months ago

Four differences spotted between ash/bash/tar/rm between busybox/std unix systems that fails on busybox with https://github.com/tasket/wyng-backup/commit/e7bc559647cf6c9e7e4880895346826b7bda9427

  1. dedup on send/arch-deduplication fails on ash based shell with string concat error (not tested otherwise)
  2. tar --no-same-owner doesn't exist -> -o
  3. tar -m doesn't exist (no keeping of modified timestamps) prevents pruning ops to succeed with busybox's tar
  4. rm -d (remove empty dir) doesn't exist under busybox

1 and 2 are easy to fix and have no known side effects being changed, where I guess 3 and 4 were optimizationz?

Code change looks like this against https://github.com/tasket/wyng-backup/commit/e7bc559647cf6c9e7e4880895346826b7bda9427

diff --git a/src/wyng b/src/wyng
index a67eb21..a87f4f4 100755
--- a/src/wyng
+++ b/src/wyng
@@ -3384,7 +3384,7 @@ def send_volume(storage, vol, curtime, ses_tags, send_all=False, benchmark=False

         # Finalize on VM/remote
         catch_signals()
-        dest.run(["rm -df "+sdir+" && mv -T "+sdir+"-tmp "+sdir
+        dest.run(["rm -rf "+sdir+" && mv -T "+sdir+"-tmp "+sdir
                  +" && mv "+vol.vid+"/volinfo.tmp "+vol.vid+"/volinfo"
                  +" && mv archive.ini.tmp archive.ini"
                  +(" && ( nohup sync -f . 2&>/dev/null & )" if options.maxsync else "")
@@ -3503,7 +3503,7 @@ def dedup_existing(aset):

     print(" linking...", end="", flush=True)
     do_exec( [dest.run_args([
-               +"/bin/cat >"+dest.dtmp+"/dest.lst.gz"
+               "/bin/cat >"+dest.dtmp+"/dest.lst.gz"
                +" && /usr/bin/python3 "+dest.dtmp+"/dest_helper.py dedup"
                ], destcd=dest.path),
               [CP.cat,"-v"],  [CP.tail,"--bytes=2000"]
@@ -3960,7 +3960,7 @@ def merge_sessions(volume, sources, target, clear_sources=False):
         volume.sessions[target].save_info(".tmp")
         cmds += [CP.tar, "-cf", "-", "../archive.ini", "../archive.ini.tmp", "merge.lst.gz",
                             target+"/manifest.z.tmp", target+"/info.tmp", "volinfo.tmp"]
-        dest_cmds += " && tar --no-same-owner -xmf -"
+        dest_cmds += " && tar -o -xf -"

     # Start merge operation on dest
     catch_signals()

Is 3 really wanted/needed? Tested working on openwrt with qubes-ssh destination.

Can do PR but wanted to clarify 3, since awk output could be used to replicate tar -m behavior if really needed.

tasket commented 5 months ago
  1. I'm not sure how to go about testing with ash. Perhaps during testing I could prepend remote commands with busybox ash ?
  2. Sorry about that. I didn't realize there was more than one occurrence of --no-same-owner in there. Easy fix.
  3. IIRC -m was added to the commands almost as a cosmetic fix. The idea was to only have the dest system's timestamps on archive files. It certainly can work without -m (its not critical) but I'd still prefer to use the remote system's time if possible.
tlaurion commented 5 months ago
  1. I'm not sure how to go about testing with ash. Perhaps during testing I could prepend remote commands with busybox ash ?

I don't think it's your problem. It's not just about ash, but depends of what is packed into busybox. I can do regression test on that ecosystem when we move forward. With this patch, everything seems to work as of now, more testing needed of course.

  1. Sorry about that. I didn't realize there was more than one occurrence of --no-same-owner in there. Easy fix.

Didn't see it either.

  1. IIRC -m was added to the commands almost as a cosmetic fix. The idea was to only have the dest system's timestamps on archive files. It certainly can work without -m (its not critical) but I'd still prefer to use the remote system's time if possible.

Alternative @tasket? I have none but to pipe tar to awk which would lower performance.

tlaurion commented 5 months ago

Added 4 in OP: rm -df (-d: delete empty dir) doesn't exist under busybox

tasket commented 5 months ago

Added 4 in OP: rm -df (-d: delete empty dir) doesn't exist under busybox

I'll review the need for it. When it was proposed as a fix, there was a mkdir creating a spurious dir and that no longer seems to be the case. Ref issue #175

tasket commented 5 months ago

Incidentally, the tar in Debian's busybox includes -m:


$ tar
BusyBox v1.35.0 (Debian 1:1.35.0-4+b3) multi-call binary.

Usage: tar c|x|t [-ZzJjahmvokO] [-f TARFILE] [-C DIR] [FILE]...

Create, extract, or list files from a tar file

        c       Create
        x       Extract
        t       List
        -f FILE Name of TARFILE ('-' for stdin/out)
        -C DIR  Change to DIR before operation
        -v      Verbose
        -O      Extract to stdout
        -m      Don't restore mtime
        -o      Don't restore user:group
        -k      Don't replace existing files
        -Z      (De)compress using compress
        -z      (De)compress using gzip
        -J      (De)compress using xz
        -j      (De)compress using bzip2
        --lzma  (De)compress using lzma
        -a      (De)compress based on extension
        -h      Follow symlinks
        --overwrite             Replace existing files
        --strip-components NUM  NUM of leading components to strip
        --no-recursion          Don't descend in directories
        --numeric-owner         Use numeric user:group
        --no-same-permissions   Don't restore access permissions
        --to-command COMMAND    Pipe files to COMMAND
`
tlaurion commented 5 months ago

@tasket OpenWrt would be unfortunately more representative of the embedded world way of configuring/building busybox:


BusyBox v1.36.1 (2024-05-17 09:51:14 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.3, r23809-234f1a2efa
 -----------------------------------------------------
root@Insurgo-LabRouter:~# tar --help
BusyBox v1.36.1 (2024-05-17 09:51:14 UTC) multi-call binary.

Usage: tar c|x|t [-zahvokO] [-f TARFILE] [-C DIR] [-T FILE] [-X FILE] [FILE]...

Create, extract, or list files from a tar file

    c   Create
    x   Extract
    t   List
    -f FILE Name of TARFILE ('-' for stdin/out)
    -C DIR  Change to DIR before operation
    -v  Verbose
    -O  Extract to stdout
    -o  Don't restore user:group
    -k  Don't replace existing files
    -z  (De)compress using gzip
    -a  (De)compress based on extension
    -h  Follow symlinks
    -T FILE File with names to include
    -X FILE File with glob patterns to exclude

Note that it's possible to install coreutils to openwrt, but if it's possible to not do that and rely on busybox instead, it would be better for compatibility reasons.

Otherwise I can also poke openwrt downstream to add that in their build recipe, I guess it's a question of a kb of additional binary size max under busybox, but not sure all busybox distributions (those are configured in for compilation) would follow.

tasket commented 5 months ago

Yeah, this shows how using tar as a generic transport can be a problem: There is big pressure to cut down a 225K executable to 223K by excising options with very simple conditional logic. But then they include the full-fat gnu awk, django, numpy, etc. in their repo. Wild and crazy.

Taking the issue title into account "work on busybox + ash based systems", we see that even defining a target system is problematic because in this case it didn't provide a functional standard.

Please let me know if you've reached a functional state on your OpenWRT router with the changes I posted today. @tlaurion

tlaurion commented 5 months ago

Did send with dedup, arch-deduplucate and prune without issues yes @tasket !

tlaurion commented 5 months ago

Yeah, this shows how using tar as a generic transport can be a problem: There is big pressure to cut down a 225K executable to 223K by excising options with very simple conditional logic. But then they include the full-fat gnu awk, django, numpy, etc. in their repo. Wild and crazy.

Taking the issue title into account "work on busybox + ash based systems", we see that even defining a target system is problematic because in this case it didn't provide a functional standard.

Please let me know if you've reached a functional state on your OpenWRT router with the changes I posted today. @tlaurion

"their repo" being openwrt? Well the logic is simply that by default, openwrt targets embedded low end devices like routers, but nowadays also new fledged raspberries and also X86 targets to turn mostly everything into high end routers+switching devices, where low end devices with everything in flash and low memory defaults to busybox with its ash shell by default.

I still think openwrt represents the busybox(and it's ash shell) as a representative of what low end devices in the embedded world deploys by default. A quick check on the coreutils packages and description bounce back into reminding the end user that most of those tools are deployed into busybox, and each of those coreutils binaries can be selected by needs, where a big fat warning advise to not do that.

Happy for wyng-backup use case, there is a python 3-light which fits the need, without having to deploy the whole python3 full fledge suite which fits the need the same way. busybox is considered pretty limited in features, while support extends for compatibility with limitations. This is also why awk is available for those relying on it extensively in their scripts if needs be. It's the embedded world dictating this from really limited storage, to recently unlimited extended storage with nvme support directly on the motherboard.

Tldr: Openwrt original embedded devices as routers are now able to be NAS+vpn+suricata+router if one's want to. But busybox is still targeting the common lower denominator for low end router only platforms where people can chose with granularity what other special additional role they want to have there. The ACM3200 router still being one of tbe the most used router for such use cases if we look at attended-sysupgrade baked upgrade images for their all deployed openwrt supported platforms out there.

Edit: stats https://sysupgrade.openwrt.org/stats/public-dashboards/5f0750ebb59c4666a957dc4261f7b90e?orgId=1&refresh=1m

tlaurion commented 5 months ago

Consider changing default sparse-write restore op (made for local archive receive/restore) to sparse and use-snapshot and compare backup restoration perf for networked archives

Ref https://github.com/tasket/wyng-util-qubes/issues/30#issuecomment-2142924849

tasket commented 5 months ago

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

tlaurion commented 5 months ago

Well: 300mb of available memory won't be enough to self-host softraid5+tor+dropbear+python3 in memory in send/backup operations.

OOM killer kills wifi at some point as well as mdadm. 512 MB+ with be hard requirement with quad core, making WRT3200ACM unfit candidate with its 2 core and 500mb.

2024-06-04-121114 2024-06-04-121043

tlaurion commented 5 months ago

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

@tasket a missing keyword was missing in my past comment:

Consider changing default sparse-write restore op (made for local archive receive/restore) to sparse and use-snapshot and compare backup restoration perf for networked archives

Ref tasket/wyng-util-qubes#30 (comment)

What I meant is sparse instead of sparse-write in case of qubes-ssh scenario solely, where in my opinion it doesn't really make any sense to have sparse-write as default there.

tlaurion commented 5 months ago

This is a tough call because as you pointed out some time ago, sparse mode doesn't perform well with some types of connections. As it is now, users can easily change the behavior with the -w options.

In the future, probably in Wyng v0.9, sparse-write will probably become the default any time the local volume already exists. That helps users or wrappers to clone related qube/volumes before a restore to attain a dedup effect. Maybe Wyng will also let you tell it what those related volumes are and do the cloning automatically (there's an issue for this).

@tasket what do you mean by certain connection type? You mean link bandwidth? Once again, from memory, I understood that

Otherwise, could you please refresh my memory/source the conclusions held here for normal/sparse/sparse-write and connection types?

I think last time we talked about it was a failed attempt on my side of finding proper hosting which unfortunately lead to a PoC for bash script which ran over sshfs mounted ssh endpoint. This was suboptimal for many reasons. My PoC went in many directions there as well, because IO were the bottleneck as well as link speed and hardware IO limits on older hardware which if local hardware is slowed down at local IO (SATA2 SSD+sshfs mouted loop device because rsync.net was a Unix server not offering python3) where all "testing" went useless.

Wyng is not at the same place as before now where other hosters could probably be tested. But this is not the thread for that.

Here i'm simply trying to wrap up any misunderstanding I could still have on current state of normal/sparse/sparse-write in case archives are definitely on slower "link" in qubes-ssh mode, as opposed to be local disk to dom0 (ssparse-write there is blazing fast no doubt) or "qubes" mode. But in case of qubes-ssh, I feel I'm missing something from your response: I ddon't get why qube-ssh would benefit in any case from sparse-write here, unless someone is being lucky/rich and have a real local hoster which link speed would not be the bottleneck and writing to ssh server would be par to writing into qube or dom0, in which case I? just think it's physically impossible. No?