Best practice for storing data locally and offsite.

chriscn commented 2 years ago

Currently my data is backed up on the server which the Minecraft server is running off, however I would like to be able to store data both locally and then off site. What would be the best practice for doing this?

As it stands I think there may be two options, but I would welcome your input on deciding which is the preferred on or whether or not they are both the same.

Running two versions of the docker-mc-backup container. Both with restic but with one on a remote container, this does give the option of customizing the backup time. I may only back up remotely every 2 hours but keep a local backup every fifteen minutes. Then I can set the remote repository in my docker-compose.yml
Writing my own script to back up the restic repo to the cloud.

I am leaning towards option one but can you see any issue with running two versions of the container. If they both tried to backup at the same time could that lead to some weird behavior? I suppose I could customise them with initial delay, having the local start up three minutes after the server starts and the second starting up after seven with then irregular intervals?

Could the project implement some kind of lock file? Or since that it only has the files mounted as read only it might be okay. My only worry would be sending the rcon command twice.

itzg commented 2 years ago

Yeah I would generally lean towards option 1 but am also worried about possible overlap of backup runs. Specifically I would worry when one instance told the server to resume file saves while the second was still archiving content.

I agree with the thought and challenge with a lock.

Ultimately I'm thinking this has to be a fairly common need and might be worth an enhancement to the backup script. I could see the user facing options being

Enable interleaved backup types, where it's always local/tar + plus the usual type selection
Frequency of remote upgrades where the user specifies in addition to every N local backup do the selected remote backup

If you're interested in PR'ing that, then that would be great. Otherwise I can queue it up, but might be a little while before I can get to it.

chriscn commented 2 years ago

I'll take a look at doing a PR with these features but I would lean towards a lockfile, it could have the container's ID in it which could be compared.

We can easily get the ID by cat /etc/hostname. Then store that in the lockfile. It does annoyingly open an attack surface if the backup container is compromised but I can't see how it would be. Or provide the option to have the lockfile elsewhere?

Would love to hear your thoughts.

itzg commented 2 years ago

If lockfile usage is optional (off by default), then I am fine with that approach. My general concern was forcing users to declare a whole volume (which might be a cloud block storage volume) for one lockfile. In fact, the option could be a declaration of the path to a lockfile to create/coordinate the backups and then users can choose any volume path they want.

chriscn commented 2 years ago

What do you think to the following:

The user if they wanted to support multiple backups would define the following variable:

ENABLE_LOCKFILE

As well as defining a lockfile location; it may be a centralised place rather than the directory, it doesn't matter hugely where it is stored as long as all the containers have access to it. You could store it where the server is but then you'd have to give write permissions to that directory.

Which if enabled would fire the following steps:

flowchart TD

begin([Start if ENABLE_LOCKFILE variable is set])
fileExists{Does the Lockfile exist?}
fileDoesExist(Wait a small amount of time)
fileDoesntExist(Create Lockfile)

startBackup([Begin the backup process])
deleteLockfile(Delete the Lockfile)

begin-->fileExists
fileExists--Yes-->fileDoesExist-->fileExists
fileExists--No-->fileDoesntExist-->startBackup-->deleteLockfile

I'm not sure if you would even need to write the container id into the lockfile. Try to keep it as simple as possible.

itzg commented 2 years ago

Overall that looks great. I'd suggest collapsing the enable/location variables into one.

Agreed, the content of the lock file becomes non-important. Container ID might still be good for debugging.

itzg / docker-mc-backup

Best practice for storing data locally and offsite. #108