mbentley / docker-timemachine

Docker image to run Samba (compatible Time Machine for macOS)
Apache License 2.0
527 stars 65 forks source link

[Feature]: Add S3 backend capability for off-site copying of sparsebundle images. #134

Open sbates130272 opened 1 year ago

sbates130272 commented 1 year ago

What problem are you looking to solve?

Off site backup of sparsebundles via AWS S3 or similar.

Describe the solution that you have in mind

Can we use something inside the image or a filesystem that leverages an S3 back-end to ensure sparsebundles are copied to the cloud for off-prem security/safety? For example it might be as simple as ensuring the container volume mount for the backup data resides on something like s3fs-fuse. What might be intesting is that this particular example removes the need for a local copy of the data and thus allows very small devices with no external storage to act as a time machine. Though I am not sure what this means for performance and some of the notes on POSIX limitations could be "interesting".

I will do some research on this and see what I can get working. We might have to be careful about not accidently leaking AWS credentials and I would prefer something that is not tied to AWS and allows other solutions (like MINIO).

Additional Context

No response

mbentley commented 1 year ago

I am not aware of there being any method to having Samba being backed by anything S3 related although I haven't looked. I doubt it through as it would be quite slow if there was a backend that would natively work with S3 and Samba and not be extremely poor from a speed perspective but I could be wrong. Otherwise, it would rely on other solutions to sync data to S3 but I also don't know the structure behind the sparesebundles to understand if it would end up copying over excessive amounts of data when doing a sync to s3.

sbates130272 commented 1 year ago

Thanks @mbentley for the quick response. Let me take a look at this and also do some performance testing to see just how feasible this is. I see two options:

  1. A FUSE based filesystem like s3fs-fuse that removes the need for local storage and uses s3 objects for the filesystem storage. But could be slow and needs to be tested.
  2. A process inside the container that syncs sparsebundle files to an s3 bucket at user-defined intervals.
sbates130272 commented 1 year ago

So I am testing doing some time machine backups to a volume mount in the container where that volume mount is also a s3fs FUSE mountpoint on the host. This is then backed by a AWS s3 bucket. I will let you know how it goes.

mbentley commented 1 year ago

Good deal, thanks! That'll be interesting to see as Samba + TM seems to be a bit picky about the underlying filesystem at times related to extended attributes.

sbates130272 commented 1 year ago

@mbentley, yes I am seeing some issues with xattr. Digging into that now to see if it is a showstopper of if we can do something to address this. Cheers.

sbates130272 commented 1 year ago

@mbentley so the lack of appropriate xattr support in s3fs does seem to be a showstopper for now. There might be some clever way around it but I am not sure what that would be. So another option is to add to your Dockerfile to enable a cron job to upload the sparsebundles to AWS every time units. But I can also do that outside your docker image if I want. So the question is would you consider such a feature in your docker image an acceptable enhancement or not?

mbentley commented 1 year ago

I am curious on your thoughts on how you might be able to include support for that as I wouldn't necessarily be opposed to it. Seems like something that would be easy enough to have disabled by default and use an env var to enable it with the appropriate keys and whatnot. Something like crond being added to the image and starting through s6 like the other services wouldn't be difficult and if someone doesn't enable it via the env var, the s6 run script would just skip starting crond.

sbates130272 commented 1 year ago

That's exactly what I was thinking @mbentley! Off by default, and then enabled via env variables and using a similar mechanism to provide the AWS credentials. And then another vairable to set the desired backup schedule and some cron variant added to the image. Let me see if I can code up a prototype this week for you to take a look at.

bitte-ein-bit commented 9 months ago

I'm currently looking into this image, so I can't update anything specific yet. What I do for all my docker-compose environment so far, is the following:

#!/bin/bash
day=$(date +%a)
TARGET_DIR=/home/pi/backup/snapshot_$day
# delete if existing
btrfs subvolume show $TARGET_DIR &>/dev/null && btrfs subvolume delete $TARGET_DIR
btrfs subvolume snapshot -r /home $TARGET_DIR

[[ $(($(date +%j)%4)) == 0 ]] && exit 0
rclone  sync $TARGET_DIR/pi aws:bucket-name -v   --exclude=**/node_modules/** --exclude=**/__pycache__/** --update --use-server-modtime --links

I'd go with a similar approach here: use an external battle-tested tool to sync to S3. If the volume is used by multiple Macs or the bandwidth is low, I'd suggest to also implement snapshotting on the volume level to ensure consistency.

EDIT: btrfs not ZFS... too tired I guess.

Alex1s commented 7 months ago

@sbates130272 How was xattr a problem with s3fs? Currently the s3fs README.md advertises extended attribute support. Maybe extended attribute support was added after you did research into this? Or what exactly was the problem?