omega8cc / boa

Barracuda Octopus Aegir 5.4.0
https://omega8.cc/compare
394 stars 75 forks source link

Enable a backup script different than Amazon S3 #849

Open yaazkal opened 8 years ago

yaazkal commented 8 years ago

Feature request It will be good to have a backup script different of the actual backboa that only uses the Amazon S3 service. It will be good if you can just backup to another server.

This article could be a starting point: http://larsolesen.dk/node/359

omega8cc commented 8 years ago

Yes, good idea.

mabo1972 commented 8 years ago

i've tried to setup a backup to a different host described as above but i always receive this message.

BackendException: ssh connection to [mylocation]:22 failed: Unknown server [mylocation]

A manual ssh connection with myusername@[mylocation] works perfect. How can this be solved?

I would use this with BOA-3.0.1 on Debian Wheezy

serrato-dan commented 5 years ago

I was wondering if anyone has gone any further with this request. I would love to use AWS, but since we are a public school entity, our lawyers have had difficulty getting AWS to adjust the legal contract wording so that it is acceptable for a contract. But, Azure has already made those changes so I'm looking into setting up basic Duplicity to backup. Given that Backboa is such a great built-in use of Duplicity, I would love to use Backboa straight to Azure rather than AWS. Just wondering if anyone else has continued exploring this area.

yaazkal commented 5 years ago

I want to suggest other provider to the list (Backblaze) just because their impresive price: $0.005/GB/month More info here: https://www.backblaze.com/b2/cloud-storage-pricing.html

yaazkal commented 5 years ago

I don't have time to test it right now, but seems that B2 is actually supported on duplicity too

https://help.backblaze.com/hc/en-us/articles/115001518354-How-to-configure-Backblaze-B2-with-Duplicity-on-Linux

PS: Azure is also supported on duplicity.

serrato-dan commented 5 years ago

Thanks for the feedback. I've been looking into Backblaze for personal backup as well.

I'm starting testing with Duplicity on a server, but the convenience and preconfigured setup of Backboa is just so great. I'm hoping I can learn to replicate that kind of service on my own by using Duplicity, but I have a big learning curve ahead of me.

omega8cc commented 5 years ago

We have requests for Azure compatibility from paying clients so it will be implemented soon. We can extend it further to support Backblaze, but as always it’s a matter of available devs bandwidth, because if we don’t use implemented feature it’s harder to keep it supported long term.

Sent with GitHawk

serrato-dan commented 5 years ago

@omega8cc That's amazing. Thank you for the information and continued hard work on this project. Look forward to seeing the progress. Have a great day.

yaazkal commented 5 years ago

@omega8cc looking at the code I can try to add support in the next few weeks for B2. At the moment the script looks at the _AWS keys and then it just do the thing. Do you prefer to just add other keys like AZR(for azure) or B2 (for backblaze b2) in order to support multiple backup destinations? or to use something like new key when the prefered backup destination is choosed? (which can lead to a new script logic)

tourtools commented 5 years ago

Also plugging in decentralized blockchain file storages, like SIA or StoreJ, would be a very impressive feature. It's only an idea, 'cause usually blockchain way offers reliable storage at very low prices (compared to corporate storage).

But it is also more complex because it involves a connected wallet in addition to the API connector, and not all support duplicity.

By the way, it was only an idea :).

omega8cc commented 5 years ago

@yaazkal It should certainly use different variables/namespace but perhaps it will also need some storage specific switches for options currently used and basically for anything AWS centric, so it should probably end with some rewrite to avoid code duplication.

omega8cc commented 5 years ago

@tourtools We should focus on solutions and storages which offer API without extra (manual) steps. Backups should just work and sometimes the complexity is too problematic to support and manage reliably.

serrato-dan commented 4 years ago

Good day. I was just wondering if anyone has seen any update on the progress of this feature. Just hopeful and thankful that it is a possibility in the future, but also hoping to update our organization's timeline. Thank you for the great work and the support to the community.

omega8cc commented 4 years ago

We don't have any ETA to share yet -- still extremely busy with other things, but we will get to this as soon as we can.

yaazkal commented 4 years ago

I suggest using rclone as a duplicity backend (like this).

Yes, duplicity supports b2, azure and others, but using rclone not only will open more possibilities but can help to simplify the backboa script because the server administrator will only need to configure as many as rclone "remotes" as needed, so the backboa script will end up just calling rclone as the remotes configuration is not managed by barracuda anymore. This will also allow to backup to multiple destinations if needed.

Let me expose with an hipotetical and simplistic example:

Guess you have configured an S3 remote for rclone called s3bckp:daily-boa-example-com and also a B2 remote called b2bckp:daily-boa-example-com

You can add those remotes to /root/.barracuda.cnf like:

_RCLONE_REMOTES = 's3bckp:daily-boa-example-com, b2bckp:daily-boa-example-com'

At the end, backboa script will only need to iterate on _RCLONE_REMOTES calling duplicity [FILES_TO_BACKUP] rclone://[REMOTE_NAME]:/[BUCKET] guessing the backed mentioned at the begining will be used. Edit: Maybe will be a better aproach to backup to one remote and then copy/clone that backup to the other destinations if multiple remotes are needed/wanted (rclone can copy/sync from one cloud to another)

Options like backup rotation policies can be located on other configuration option like _BACKBOA_ROTATION and be global.

To summarize the benefits:

I know this change will need some refactor of the backboa script, but it also needs one anyway if more cloud providers or protocols wants to be supported, so one implementation to solve them all?

Regards !

yaazkal commented 4 years ago

Hi again.

In the last post I've proposed to use rclone as a duplicity backend because it can integrate easily with the current duplicity way of doing backups and because rclone opens the door to use whatever cloud provider or protocol you want to use as the backup destination.

This way, refactoring the actual backboa script will need fewer edits than adding support for each cloud provider as a duplicity backed and can also be backwards compatible so you can still use the _AWS configuration or just use the proposed _RCLONE configuration.

I also mentioned that enabling rclone as a duplicity backend is suggested because it will also open the door for future implementation of a new backup strategy aproach using restic instead of duplicity (maybe as a new feature or as an option so the user can choose to backup using duplicity or restic). Let me share what I've found in the last three weeks using both backup strategies on a production server:

Despite restic doesn't support compression at the moment, restic backups are way more tiny than duplicity ones, this can be due the fact that restic by default uses deduplication. As we are mainly hosting Drupal sites and generally speaking the code is pretty similiar from platform to platform, it will detect a lot of duplicates and will backup only one of those duplicated chuncks (not files) (see CDC).

This size difference is also due the different strategy that each software uses. As duplicity is configured to make a full backup each week, and then do incremental backups. Restic on the other hand has this concept of snapshots, so there are not full backups every week, just the initial one and after that everything is differential.

Backup test for the same server using daily backups:

Initial Duplicity backup: 55,8 GB Initial restic backup: 27 GB

After 3 weeks:

Bucket size hosting Duplicity backups: 142 GB Bucket size hosting restic backups: 28,3 GB

I know that my Duplicity will hit about 245 GB before backup rotation. Restic can increase the size about to 30 GB I guess. For me this is a huge difference that will also translate in savings with your currect bucket provider (not to mention if you also change provider as I do from AWS to Backblaze).

As the server resource consumption during backups, I can see that restic uses way more I/O and a bit more of CPU but it takes less time in general (I ran restic without ionice, so end results will be different).

Conclusion and suggestion

Adding rclone as a duplicity backend will open the possibilities to store backups to other destinations but the backup strategy will be the same. Using a modern aproach as a backup strategy like using restic not only will open the possibilities to use other cloud providers but also to save resources like storage space that at the end will have impact in the money needed to run the system. But that will not be the only benefit, restic uses encryption by default and also supports common cloud providers and protocols without using external backends, anyway the possibilty to use rclone is supported if more cloud providers or different protocols are needed.

I think backboa can live as it is now and develop a new backup script that uses restic instead. Both systems can coexist for a transitioning time and then deprecate backboa for later versions.

APPENDIX

How I ran restic to test it

I ran restic not exactly in a BOA way because I was just testing it to see the differences. I mean, this code not necesarly will fit a PR but can work as a base. I've used restic 0.9.6 with Backblaze as object storage provider.

I have this files:

/root/restic (the restic binary) /root/.bash_aliases (using the aliases file to store variables) /root/.bck.include (config file for listing the include paths) /root/.bck.exclude (config file for listing the exclude paths) /root/rbackup.sh (script file for the backup command) /root/rforget.sh (script file for the backup rotation command)

Of course rbackup.sh and rforget.sh can be just one file that takes args and the .bash_aliases is not needed here, but for testing purposes I just made it that way.

Note that you need to initialize the restic repository first, please check the restic docs.

I've edited /etc/crontab to execute the scripts:

0 5     * * *   root    bash /root/rbackup.sh >/dev/null 2>&1
30 23   * * 0   root    bash /root/rforget.sh >/dev/null 2>&1

/root/.bash_aliases content:

export RESTIC_REPOSITORY=b2:daily-boa-MY-SERVER
export RESTIC_PASSWORD=MY_BACKUP_PASSWORD
export B2_ACCOUNT_ID=BACKBLASE_ACCOUNT_ID
export B2_ACCOUNT_KEY=BACKBLASE_ACCOUNT_KEY

/root/.bck.include content:

# BOA specific
/etc
/var/aegir
/var/www
/home
/data

/root/.bck.exclude content:

# BOA specific
/data/conf/arch

# Uncomment next line to exclude drush archives
# /data/disk/*/backups/*.tar.gz

/root/rbackup.sh content:

#!/bin/bash
source /root/.bash_aliases
/root/restic backup --files-from /root/.bck.include --exclude-file=/root/.bck.exclude

/root/rforget.sh content:

#!/bin/bash
source /root/.bash_aliases
/root/restic forget --keep-last 31 --prune

If @omega8cc is ok with this suggestion I'm open to discuss a little more this topic and create a PR by the end of March.

omega8cc commented 4 years ago

This looks very interesting, thank you! We appreciate your work and the offer to submit patches to add support for this new feature. We will look into this ourselves too.

yaazkal commented 4 years ago

@omega8cc great!

Please let me know after your evaluation and test if you take any decision and wants me to do a PR later on or if you will work on this.

Regards !

omega8cc commented 4 years ago

Bumping, so it gets proper attention soon.

ar-jan commented 4 years ago

Regarding deduplication, maybe this could also be made to work for database backups? Gzip (in Debian) has an --rsyncable option which yields slightly larger archives, but works better for deduplication as with e.g. restic, because it compresses blocks independently. BOA used to use gzip for compressing database backups, though now it uses bzip2 (not sure what the reason for switching was?).

Would it make sense to have (an option for) rsyncable/deduplicable database dumps? Instead of Debian/Ubuntu's patched gzip with --rsyncable, there's also pigz (a parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data) which also has the --rsyncable option.

serrato-dan commented 3 years ago

It's been more than a year since @yaazkal recommendation. Just wondering if there has been any progress or decisions made on this topic. Thank you.