Open sbates130272 opened 1 year ago
I am not aware of there being any method to having Samba being backed by anything S3 related although I haven't looked. I doubt it through as it would be quite slow if there was a backend that would natively work with S3 and Samba and not be extremely poor from a speed perspective but I could be wrong. Otherwise, it would rely on other solutions to sync data to S3 but I also don't know the structure behind the sparesebundles to understand if it would end up copying over excessive amounts of data when doing a sync to s3.
Thanks @mbentley for the quick response. Let me take a look at this and also do some performance testing to see just how feasible this is. I see two options:
So I am testing doing some time machine backups to a volume mount in the container where that volume mount is also a s3fs FUSE mountpoint on the host. This is then backed by a AWS s3 bucket. I will let you know how it goes.
Good deal, thanks! That'll be interesting to see as Samba + TM seems to be a bit picky about the underlying filesystem at times related to extended attributes.
@mbentley, yes I am seeing some issues with xattr. Digging into that now to see if it is a showstopper of if we can do something to address this. Cheers.
@mbentley so the lack of appropriate xattr support in s3fs does seem to be a showstopper for now. There might be some clever way around it but I am not sure what that would be. So another option is to add to your Dockerfile to enable a cron job to upload the sparsebundles to AWS every
I am curious on your thoughts on how you might be able to include support for that as I wouldn't necessarily be opposed to it. Seems like something that would be easy enough to have disabled by default and use an env var to enable it with the appropriate keys and whatnot. Something like crond
being added to the image and starting through s6 like the other services wouldn't be difficult and if someone doesn't enable it via the env var, the s6 run
script would just skip starting crond
.
That's exactly what I was thinking @mbentley! Off by default, and then enabled via env variables and using a similar mechanism to provide the AWS credentials. And then another vairable to set the desired backup schedule and some cron variant added to the image. Let me see if I can code up a prototype this week for you to take a look at.
I'm currently looking into this image, so I can't update anything specific yet. What I do for all my docker-compose environment so far, is the following:
rclone
. The main reason for only every four days is costs.#!/bin/bash
day=$(date +%a)
TARGET_DIR=/home/pi/backup/snapshot_$day
# delete if existing
btrfs subvolume show $TARGET_DIR &>/dev/null && btrfs subvolume delete $TARGET_DIR
btrfs subvolume snapshot -r /home $TARGET_DIR
[[ $(($(date +%j)%4)) == 0 ]] && exit 0
rclone sync $TARGET_DIR/pi aws:bucket-name -v --exclude=**/node_modules/** --exclude=**/__pycache__/** --update --use-server-modtime --links
I'd go with a similar approach here: use an external battle-tested tool to sync to S3. If the volume is used by multiple Macs or the bandwidth is low, I'd suggest to also implement snapshotting on the volume level to ensure consistency.
EDIT: btrfs not ZFS... too tired I guess.
@sbates130272 How was xattr a problem with s3fs? Currently the s3fs README.md advertises extended attribute support. Maybe extended attribute support was added after you did research into this? Or what exactly was the problem?
What problem are you looking to solve?
Off site backup of sparsebundles via AWS S3 or similar.
Describe the solution that you have in mind
Can we use something inside the image or a filesystem that leverages an S3 back-end to ensure sparsebundles are copied to the cloud for off-prem security/safety? For example it might be as simple as ensuring the container volume mount for the backup data resides on something like s3fs-fuse. What might be intesting is that this particular example removes the need for a local copy of the data and thus allows very small devices with no external storage to act as a time machine. Though I am not sure what this means for performance and some of the notes on POSIX limitations could be "interesting".
I will do some research on this and see what I can get working. We might have to be careful about not accidently leaking AWS credentials and I would prefer something that is not tied to AWS and allows other solutions (like MINIO).
Additional Context
No response