mailcow / mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕
https://mailcow.email
GNU General Public License v3.0
8.87k stars 1.18k forks source link

Backups - Incremental & Encrypted using Duplicity #1575

Closed jameswyld closed 5 years ago

jameswyld commented 6 years ago

I'd like to share a proof-of-concept for containerized, incremental, encrypted backups that are configured as a mailcow subservice. This is something I feel is missing from the base mailcow project, and wanted to see if I could solve for it. I'd love to know if something like this would be useful for the overall project, or ways to improve it.

Cheers!

--> See branch here: https://github.com/jameswyld/mailcow-dockerized/tree/mailcow-duplicity

UPDATED 2018-07-22 Configuration and admin moved over to admin UI. Read here. --> https://github.com/mailcow/mailcow-dockerized/issues/1575#issuecomment-406844318

Notes This allows mailcow to handle backing itself up, taking care of scheduling and notification. Configuration is done using mailcow's existing methods (generate_config.sh & docker-compose.yml). Using duplicity makes it simple to backup to a location outside of the mailcow server (S3-compatible bucket by default), meaning files do not need to be "double handled" on the server during regular operation, or automated using any kind of cron job or scripting on the host itself. The backups can be encrypted to protect the mailserver data while at rest, as well as being compressed, and incremental to reduce storage and transfer size.

Features

Default tasks

Restore Stand-alone helper-script (duplicity-restore.sh) to handle restoration on a fresh system. This works however could most likely be improved.

kilo42L commented 6 years ago

I would prefer a borg backup solution over duplicity.

ntimo commented 6 years ago

Hey, this looks really really awesome. I hope it will be implemented into Mailcow :) I would love to see a setting that would create backups of the vmail volume on a hourly basis. Oh and is it also compatible with sftp or ftps servers? :)

jameswyld commented 6 years ago

Duplicity itself is compatible with a ton of backends (see here: http://duplicity.nongnu.org/), and the container this is based on - Tecnativa/docker-duplicity - should in theory allow most to be used. The example I've used is for S3, but to change it, all that would be required is a bit of tweaking in mailcow.conf / docker-compose.yml.

edit: and the container supports down to 15-min increments for tasks, so you could tweak it for super-regular incremental backups if you needed. You'd probably want more regular full-backups too, so the backup indexes don't get too out-of-hand.

ntimo commented 6 years ago

Hey, this sounds really really nice, what do you think bout making this configurable like having a config value that would enable hourly backups and one that can enable / disable storage backends so you can configure multiple backends like S3 SFTP FTPS B2 and so on.

jameswyld commented 6 years ago

That shouldn't be difficult at all. As it is you should be able just edit docker-compose.yml and do all of this - pretty much everything (including what jobs/scripts to run) is controlled by setting environment variables at the moment.

FTP Backend As to setting an FTP backend, I think you'd just need to put the FTP password environment variable into the duplicity image. I've not tried this though: docker-compose.yml: duplicity-backup: environment: DST: ftps://user@server.com/path/to/backup FTP_PASSWORD: ${DUPLICITY_FTP_PASS}

Job Frequency The frequency for each job is set in docker-compose.yml now: duplicity-backup: environment: JOB_200_WHEN: daily JOB_200_WHAT: mysql-backup But that "daily" setting could be abstracted to a variable set in mailcow.conf without much effort.

The question here whether mailcow.conf is the right place to set this up? While it is not too complex, the flexibility of having multiple backup destination types may increase the amount configuration needing to be done in mailcow.conf. There would be different variables needed to be set, depending on what backup schema and type of credentials are required. Eg. for AWS S3, you'd need path, api key, secret, but for FTP you'd just need the password, but need to remember it is set differently than an api key / secret.

I wonder if it would be better for this type of backup configuration to live in the web UI, with settings stored in mysql, and a modification to the duplicity image to find the settings in the db.

What do people think on that one?

ntimo commented 6 years ago

When I try to pull your the duplicity container I get this error: ERROR: for duplicity-backup pull access denied for mailcow/docker-duplicity, repository does not exist or may require 'docker login' ERROR: pull access denied for mailcow/docker-duplicity, repository does not exist or may require 'docker login'

I think being able to configure the backups using the web ui would be super super awesome, maybe with a dropdown to select the storage backend? So that after selecting that you get the appropriate settings for S3 or FTP to configure :)

jameswyld commented 6 years ago

Agree on the UI idea. That would be something I'd have to learn as I've not touched the frontend side of mailcow yet. I'll have a crack at that when I get time ;)

Regarding the pull issue.... it is possible I've messed something up by naming the image that way within the docker-compose file. The duplicity image should be coming from a local Dockerfile in that repo. build: ./data/Dockerfiles/duplicity

Do you have the duplicity folder under mailcow-dockerized/data/Dockerfiles?

jameswyld commented 6 years ago

Was able to replicate your issue by issuing docker-compose pull duplicity-backup. It should be docker-compose build duplicity-backup. If you just run docker-compose up -d it should figure it out.

ntimo commented 6 years ago

Okay, I got the container running, but its always restarting and the log always prints this out: duplicity-backup_1 | /usr/local/bin/entrypoint: exec: line 13: /bootstrap.sh: Permission denied

Adding: RUN chmod a+x /bootstrap.sh to the Dockerfile solved this issue :)

But now when it runs the cron job it says, this seems to be wrong: duplicity-backup_1 | INFO:jobrunner:Nothing to do

jameswyld commented 6 years ago

Yeah it looks like I had some errors in that Dockerfile with setting permissions on the files it copies in. I didn't catch it on my local environment. I had this too yesterday when I was testing something else.

Regarding cron, the jobrunner script looks at the path it was run from, to decide what to do (15 min / daily / weekly etc), so to run a test of the "daily" jobs, you need to execute it from: /etc/periodic/daily/jobrunner

The way I test the daily cron tasks is with this command: docker-compose exec duplicity-backup /etc/periodic/daily/jobrunner

In other news I'm currently playing with moving the settings into the redis instance, and exposing them in the configuration web UI. I'll also need to refactor the duplicity image to reference those from the cron job, which might take a bit of time time.

ntimo commented 6 years ago

Okay, when I run this command I get a lot of connection errors, and I checked the ftps path and the password I supplied and both are correct. Being able to configure it using a webui would indeed be nice. I now tried using webdav and that worked just about fine :)

Oh and maybe it would be a good idea to remove the old mailcow folder first and after removing it start the restore :) otherwise copying the old backup mailcow-dockerised folder will probably fail, as well as removing the restore folder from the /opt/ folder.

I also think the container name for duplicity should be duplicity-mailcow so its in line with the naming of the other containers.

Another question, how can I get the backup to run hourly now it only does a backup of mysql every hour. But I want to backup the vmail and other data every hour too.

jameswyld commented 6 years ago

I've pushed a new version up to my branch which has all config moved to the admin UI. Please accept my apologies in advance for any php mistakes, it has been a while since I've worked on frontend and I'm rusty.

Updated Features:

General Notes: I think it is still very much in "proof of concept" state, and would probably benefit from assessing a few more things:

Let me know your feedback folks!

Screenshots: mailcow-duplicity-backup-settings mailcow-duplicity-backup-jobs

jameswyld commented 6 years ago

@ntimo Thanks for the fixes to the Dockerfile & restore script. Also, I agree on naming - I was looking at that today as I think it is a blocker from being able to use debug.php for the container. Regarding frequency - In the config-file version, there is an implicit daily job called JOB_300. If you override that in docker-compose.yml you'd be able to change the vmail backup frequency.

Shouldn't be an issue with the new admin-ui version though!

ntimo commented 6 years ago

I just tested it out and it works super super super super great. I will add German translations once the webui is out of beta. -It would be nice if the email that the reports are send to would be configurable too :) -And the restore script could need some kind of feature to select which backup version it should restore. -And maybe the backup logs could be displayed in the debug section too -Having a option to set the maximum time the backups should be retained before the get deleted

Knight1 commented 6 years ago

Hi :)

What do you think about https://rclone.org/ ?

I wonder what happens when the mysql backup fails? There is no exit code check. https://github.com/mailcow/mailcow-dockerized/compare/master...jameswyld:mailcow-duplicity#diff-f3662fbf36090e3dc6e6f555efa54d05R4

jameswyld commented 6 years ago

@Knight1 - rclone looks pretty nice - similar backend support to duplicity by the look of it, so may be an option. Am I right in reading that it is mainly used to sync/clone a path to a remote location?

Some food for thought: One important feature/question to think about - what is the value of incremental, point-in-time backup snaphsots over a directory sync. For many backup systems, an important feature is the ability to restore a full snapshot from any individual backup over the last xx months. Many third party storage systems (S3, Dropbox, Drive etc) support file versioning on their side, but that might be painful to admin and sift through.

Scenario: A mailbox is inadvertently accessed and it's contents deleted. It takes a number of days for the owner to realize and notify the admin. By this point, the modified (deleted) mailbox has been backed up multiple times. To restore it, you need to extract data from a backup taken a week ago.

Regarding mysql backup - yes its pretty rudimentary at this point. Right now I spot-check the backup notification mail to ensure that the file size changes to a sane value day-to-day, but certainly open to suggestions on how to improve it.

jameswyld commented 6 years ago

@kotekle - I haven't yet worked on what an upgrade process for existing servers might look like for the adminUI version. I wouldn't recommend doing that on a live server, since it is still very much experimental. I certainly haven't put it on my own yet.

If you want to test on a real server, and are not afraid of modifying mailcow.conf and docker-compose.yml, you could copy the older version (from before I made wholesale changes to the webui). This would be appropriate if you want to play around with the duplicity container in an existing server, and get a feel for how it does backups - size, frequency, destinations etc - I've made the final version of that one available here: mailcow-duplicity-original. To copy it into an existing server Install of the non-ui version into an existing server would look like this:

This would get you what I've had running for a number of months on my own mailcow server, which I've been able to use successfully to migrate a full server between instances.

ntimo commented 6 years ago

Do you have any updates? :)

jameswyld commented 6 years ago

Hi - haven't had time to iterate further yet. I was hoping that @andryyy or someone close to the mailcow project would give some feedback & what would be the best next steps (if any).

I'd love to see some fully integrated backups as a part of mailcow, but failing that, there are a couple of other options to look at.

theamazingaustin commented 6 years ago

This would be an absolutely fantastic feature! I was planning on a basic setup mapping a S3 container to my Digital Ocean droplet, and just running regular full backups, but I love the incremental backups that duplicity offers as well as the FTP all via the GUI. Thanks for the hard work, can't wait to see where this goes!

ntimo commented 6 years ago

@andryyy what do you think about this backup feature? :)

deanpcmad commented 6 years ago

This looks good. How's it going?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jameswyld commented 5 years ago

Just a general note on this one, as it has been a while, and there was no conclusive close to the issue.

I am very happy to help integrate this into the mailcow UI, but not really sure on next steps or if the core mailcow team want to integrate a backup system into the project. Thus, I haven't been maintaining the backup/UI branch, and have let this auto-close.

Personally, I'm still running off the non-UI version referenced here , which is running pretty smoothly and sits beside mailcow (rather than fully integrated). It doesn't need much maintenance to keep up with new mailcow versions.

andryyy commented 5 years ago

A new container for backups will not happen. I don't want to force a duplicity container for backups. There are other ways to implement backups outside mailcow. People use different approaches for backups, I don't want to force-add a system inside mailcow. :/ I would be fine with something like a config exporter/importer, that can also be queried using the API. You can use all tools outside mailcow to backup and restore it. Deduplication is also not a problem.

kilo42L commented 5 years ago

I fully agree andryyy. If anything it should be a modular thing that can be added on top of an existing setup.

**I run mailcow on a vm that does snapshots and backups and the actual mail is archived on a mail archive system (mailstore).

jameswyld commented 5 years ago

Thanks for the response @andryyy . This is good direction to know, so any future iterations I do with duplicity I'll head in the direction of being a module / side-by-side thing rather than fully integrated.

Thanks!

michacassola commented 5 years ago

@jameswyld Are you continuing somewhere? Can you please give us the link of the repo once you decide to move ahead? Thank you very much in advance!

amrnassar93 commented 5 years ago

I installed it but i can't find the backups section on the web UI.

Could anyone advise what could be the problem.

Thanks.

jameswyld commented 5 years ago

@amrnassar93 @michacassola Sorry for the slow response here.

Based on Andryyy's comments above, I've moved away from the idea of integrating this into mailcow, since the project's philosophy is for backup solutions to remain outside of mailcow. For that reason, what I use today is effectively still the same as the original version of this idea, since it doesn't mess with mailcow.

A duplicity container is used for backups, its settings live within mailcow.conf / docker-compose.yaml, however that is it. It isn't fully separate, but does not require any modification to the core mailcow components (eg. ui). Because of this, the mailcow update script has remained working without problem for me (~12 months now).

That version is here --> https://github.com/jameswyld/mailcow-dockerized/tree/mailcow-duplicity-original

Cheers! James.

hachre commented 5 years ago

@jameswyld Fyi: You could use docker-compose.override.yml to add your changes. This way even the original docker-compose.yml would stay untouched.

jameswyld commented 5 years ago

Great idea! TIL. :)

michacassola commented 5 years ago

Dear @jameswyld thanks for all your efforts. Please provide instructions of how to install and possibly uninstall your solution.

ciroiriarte commented 5 years ago

Too bad, given is a selfcontained solution, integrated backup sounds natural.