offen / docker-volume-backup

Backup Docker volumes locally or to any S3, WebDAV, Azure Blob Storage, Dropbox or SSH compatible storage
https://offen.github.io/docker-volume-backup/
Mozilla Public License 2.0
2.1k stars 85 forks source link

Using Docker labels to have per-service configuration #329

Open muratcorlu opened 10 months ago

muratcorlu commented 10 months ago

Is your feature request related to a problem? Please describe.

I have a big Docker Swarm machine that I host all of my projects as Docker containers. I want to have a central backup solution for all of the Swarm Stacks/Services that can be configured inside the projects' own docker-compose files.

Describe the solution you'd like

I use Traefik for routing incoming traffic to the containers. In Traefik, you don't give a central configuration, instead all of the service configurations are set by Docker labels inside project docker-compose files. This simplify the things a lot.

So I would like to be able to enable backup for a docker service like below:

myservice:
  image: ....
  volumes:
    - myvolume:/data
  deploy:
    labels:
      - "backup.enabled=true"
      - "backup.source=myvolume"
      # and other configurations as well, like a custom schedule, retention etc.

Describe alternatives you've considered

  1. Running separate backup instances per stack doesn't seem efficient, since I'll duplicate a lot of configuration.
  2. I considered backing up the parent folder of the path that Docker keeps all of the volumes. But that doesn't sound a good idea, maybe is? 🤷🏻
  3. Configuring everything in a central place was another consideration but every project has its own Git repository, docker-compose file and CI/CD pipelines and keeping backup logic in another place is always a risk to have hassle.
m90 commented 10 months ago

Thanks for this suggestion. I'm not a traefik user myself, but I have seen its configuration approach being very popular in the Swarm/compose world. The trouble is that such an approach is very different from the way the tool is currently sourcing its configuration from, so adding support for this would require some major refactoring (which is probably good). I'll have to think about a bit. If you have a full blown "dream API example" (i.e. a compose service definition) of how you would think this could work, that'd also be helpful.

For the time being, your options are:

pixxon commented 10 months ago

I also considered this approach, but one major obstacle would be to figure out how to mount the volumes nicely for backup. I would not want every single volume to be mounted in the manager, so instead some sort of containerized periodic task based approach would be the best.

However, I am not sure how to configure the volume that needs to be backed up. The full name of the volume that needs to be backed up might not be available when the service is being labeled. ( The stack name will be added as a postfix. ) Best way would be to just specify a path in the application and mount the directory from there, but I have no idea if that is possible. ( Something like COPY --from in builds. )

pixxon commented 10 months ago

Maybe a wild take, but instead of adding the labels to services, one could add them to the volumes themselves? Docker allows objects to be labeled, including volumes. Both the volume create and docker compose file allow the specification of them.

To me it is very appealing as it would really bind the service to volumes themselves. The labels to stop containers / services would still be applied to them, but when and what to backup could be defined on the volume itself.

Lastly the where to backup could be defined on the manager itself, somewhat how traefik defines entrypoints. Then these are just referenced in the volume labels to avoid redundancy.

muratcorlu commented 10 months ago

Labelling volumes instead of containers for configuring backup makes so much sense. It would also fix the potential issue of what will happen if same volume mounted to multiple containers.

volumes:
  my_data:
    labels:
      backup.enabled: true
      backup.retention: 7

Looks awesome! 😊

m90 commented 10 months ago

I also like the idea of labeling volumes a lot, but I still have a slightly hazy vision of who'd be controlling whom when you can label both volumes and services and containers. I.e. in a setup that runs multiple schedules, who'd tell containers/services when they need to be stopped? Would each labeled volume create a cron schedule? What happens if labels change, who's notfiying the backup container that it needs to create a new cron? What about bind mounts?

I could continue the list forever, but I guess that doesn't lead anywhere. Maybe a good next step would be translating one of the test cases https://github.com/offen/docker-volume-backup/tree/main/test into the desired new configuration style, so we can get an idea of what this would really look like, and how (and if even) such a configuration approach could be compatible with the existing one, or if this would require a hard cut?

pixxon commented 10 months ago

I have tried to make a quick mock example of what I was trying to explain above. This would certainly change the whole approach of the backup software.

The need to backup a volume would be labelled on the volume itself. This would define when to backup and other options related to how long they should be kept, what name should it be stored under.

The connection to where the backups are stored is only defined on the backup container itself, to avoid redundant definitions. ( There could be some issues, for example trying to store in different buckets, then those labels would need to move onto the volumes. )

Thirdly the containers to be interacted with are still defined on the containers themselves. So all the executions and which ones need to be stopped would remain there.

version: '3.9'

volumes:
  postgresql_db:
    labels:
      docker-volume-backup.stop-during-backup: postgres
      docker-volume-backup.filename: postgres-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.pruning-prefix: postgres
      docker-volume-backup.retention-days: 7
      docker-volume-backup.cron-expression: 0 2 * * *

  redis_db:
    labels:
      docker-volume-backup.stop-during-backup: redis
      docker-volume-backup.filename: redis-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.pruning-prefix: redis
      docker-volume-backup.retention-days: 7
      docker-volume-backup.cron-expression: 0 2 * * *

services:
  postgres:
    image: postgres
    volumes:
      - type: volume
        source: postgresql_db
        target: /var/lib/postgresql/data
    labels:
      docker-volume-backup.archive-pre: pg_dumpall -U postgres > /var/lib/postgresql/data/backup.sql
      docker-volume-backup.exec-label: postgres

  redis:
    image: redis
    volumes:
      - type: volume
        source: redis_db
        target: /data

  backup:
    image: offen/docker-volume-backup
    environment:
      AWS_ENDPOINT: minio
      AWS_S3_BUCKET_NAME: backup
      AWS_ACCESS_KEY_ID: test
      AWS_SECRET_ACCESS_KEY: test
    volumes:
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: true

  postgres_user:
    image: testimage:latest
    labels:
      docker-volume-backup.stop-during-backup: postgres

  redis_user:
    image: testimage:latest
    labels:
      docker-volume-backup.stop-during-backup: redis
pixxon commented 10 months ago

how (and if even) such a configuration approach could be compatible with the existing one, or if this would require a hard cut?

It would certainly be possible to keep these two compatible. Not sure if you are familiar with Prometheus. They use something called static configs and service discovery. First would be what is currently available with the backup, second is what this would become.

pixxon commented 10 months ago

What happens if labels change, who's notfiying the backup container that it needs to create a new cron? What about bind mounts?

The backup would need to poll the changes from the docker socket and when there is a label change / new label, it would need to update the configuration. Minor help for this task is that labels cannot be added or changed on docker volumes, they have to be defined when the volume is created. ( At least this is via the cli, I am not sure if possible with some other coding, or from docker plugins. )

who'd tell containers/services when they need to be stopped?

From the above example, the volume would define a label postgres and redis that could be referenced by containers that use the volume ( or have an indirect dependency ) with their own labels.

Would each labeled volume create a cron schedule?

I think yes, not sure if there is a need for one volume to create multiple schedules. ( For something like keeping the daily backups for a week, weekly backups for a year. ) In that case the approach could be similar to how Traefik groups the routers/endpoints using the name of the labels. So in the above example they would become:

  redis_db:
    labels:
      docker-volume-backup.stop-during-backup: redis
      docker-volume-backup.daily.filename: redis-daily-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.daily.pruning-prefix: redis-daily
      docker-volume-backup.daily.retention-days: 7
      docker-volume-backup.daily.cron-expression: 0 2 * * *
      docker-volume-backup.weekly.filename: redis-weekly-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.weekly.pruning-prefix: redis-weekly
      docker-volume-backup.weekly.retention-days: 365
      docker-volume-backup.weekly.cron-expression: 0 0 * * 0
m90 commented 10 months ago

Just to manage expectations: I appreciate all of your feedback, and I like the direction this is going, but it also means the entire tool would need to be rearchitected (i.e. supporting both static config and service discovery), so this is not something I can implement easily in my free time (the way this project is currently run).

I'll move this around in the back of my head for a while, maybe I can come up with a way this could be sliced into several "sub-features" that could be worked on one after the other.

If you have further ideas, please leave them here, I'm happy to learn about them.

pixxon commented 10 months ago

this is not something I can implement easily in my free time (the way this project is currently run).

Since this is something that I would like to use, I could try to help out with the implementation. Disclaimer is that I am not a go developer. ( I mainly use C++. )

I can come up with a way this could be sliced into several "sub-features" that could be worked on one after the other.

Tasks like #268 would lead up to this. I could also help by making a dummy implementation of the above idea that could be adapted into the tool down the line.

m90 commented 10 months ago

Thanks for offering your help. I'd be happy if you wanted to work on this.

Still, I won't be able to merge a single PR that basically rewrites the entire tool, so we'd need to plan this out a bit better. I'll still need to understand what's going on as I'll also keep maintaining this.

Right now, my idea would be to maybe proceed something like this:

  1. Remove crond usage and instead allow this tool to run as a long running process that schedules work itself. It should still be possible to invoke a backup manually. This will also need to support reading configuration from conf.d. Kind of like #268 as you already mentioned.
  2. Come up with a mechanism that can pull configuration from basically anything, i.e. env vars, conf.d or Docker labels. This is probably something abstract, and then we write adapters for each mean of configuration. Maybe this already exists, I'd think it made sense to look at how Traefik does this.
  3. Connect the new configuration method with the Docker daemon, polling repeatedly for changes.

I'm not sure if this should be worked on as 1,2,3 or 2,1,3

Let me know what you think.

pixxon commented 10 months ago

I understand the concerns, I was not thinking of a single large PR either.

The 1,2,3 order seems to be easier, especially since 1 is already in the form of an issue. If it's alright, I will start looking into that one.

m90 commented 10 months ago

99 is probably also related to what I wrote, albeit I'm not sure if this tool should start having any sort of persistence, so I'd maybe not offer this feature, even if it'd be possible.

Also, I wanted to mention a change this big would probably warrant a v3, so some minor breaking changes would probably be ok, see #80

pixxon commented 10 months ago

I am not a fan of introducing persistence to a tool that should back that up. ( It would need to create a backup of its own? )

Regarding REST API, the most I could imagine is a read only visualization of the setup. ( What are the storage options, what volumes are configured, which containers/services are labelled. )

m90 commented 10 months ago

Yeah, let's not bother about this for now. If we want a read only visualization we could also introduce a backup --debug functionality or something that dumps everything it currently thinks it should be doing.

m90 commented 9 months ago

One situation we should think about (and where I don't have a solution at hand right now) popped up in my head just this morning:

Assuming users can create backup schedules by labeling their volumes, this means the tool will repeatedly poll the Docker daemon for volumes, check their labels and then create schedules. My concern is: how does this work in case I deploy multiple stacks that each run a offen/docker-volume-backup container (which I've seen people do from what is being posted in issues and discussions)? From what I understand the daemon will always return all volumes when running docker volume ls. How would a backup container know which of the volumes are within its own stack and which ones aren't (and should therefore be skipped)?

pixxon commented 9 months ago

how does this work in case I deploy multiple stacks that each run a offen/docker-volume-backup container (which I've seen people do from what is being posted in issues and discussions)?

In the new setup this should not be happening. What could be a solution is simply exit when the second instant would start up. If this option is really something that needs to be supported, then a flag could disable the docker service discovery feature so multiple instances can be deployed in the same swarm.

m90 commented 9 months ago

If this option is really something that needs to be supported

Yes, this definitely needs to stay supported. It's still the easiest way to get multiple schedules up and running in case you just want to get the job done, and also what I would pick in such cases. Also, it's been the only way of having multiple schedules before v2.14.0 so users from before that version might still be using such setups.

If there is no way around this problem, we'll need to make service discovery disabled by default. When enabled the container can somehow check if a sibling is already runnning on the same host and refuse to start in such cases (how exactly this is implemented I don't know yet).

MyWay commented 9 months ago

This is an interesting approach I was looking for too, though as already said probably it needs some time finding the best solution. Furthermore the old approach is more convenient in some use cases.

m90 commented 9 months ago

@pixxon I did some refactoring of the configuration handlinfg in order to prepare for this and #364 in #360

I think everything should be ready now for starting to work on this, so in case you want to start working on this, please feel free to go ahead. No need to hurry or anything though, just wanted to let you know about the changes.

pixxon commented 7 months ago

Hey @m90, sorry for the delay, I did not have time to look into this issue before.

I plan to use traefik paerser to process the labels, but to make it nice, it requires modifications to the configs themselves. I think it would also make sense to utilize the paerser for both env vars and files, since it has the capability and would remove some techdebt. ( Using both envvars and paerser could result in duplicated configs. )

Brief show of how the paerser works

Example 1:

config

type Config struct {
    BackupCronExpression string
    BackupStopDuringBackupLabel string
}

flag/label

--backupcronexpression=@daily
--backupstopduringbackuplabel=test

env var

BACKUPCRONEXPRESSION=@daily
BACKUPSTOPDURINGBACKUPLABEL=test

Example 2

config

type Config struct {
    Backup BackupConfig
}
type BackupConfig struct {
    CronExpression string
    StopDuringBackupLabel string
}

flag/label

--backup.cronexpression=@daily
--backup.stopduringbackuplabel=test

env var

BACKUP_CRONEXPRESSION=@daily
BACKUP_STOPDURINGBACKUPLABEL=test

Example 3

config

type Config struct {
    Backup BackupConfig
}
type BackupConfig struct {
    Cron struct {
        Expression string
    }
    Stop struct {
        During struct {
            Backup struct {
                Label string
            }
        }
    }
}

flag/label

--backup.cron.expression=@daily
--backup.stop.during.backup.label=test

env var

BACKUP_CRON_EXPRESSION=@daily
BACKUP_STOP_DURING_BACKUP_LABEL=test

The first example uses the current config, but I personally find it pretty unreadable and using more structs would help with it. The third example results in the env vars looking like what they look like currently, but it results in the config being way too verbose with anonym types.

I would personally go with the second option, where settings that are related, are stored in a struct. However this would most certainly create a breaking change, many underscores would be gone from the environment variables. Another, not impossible to overcome, but worth to mention, is that traefik likes to start env vars with the same prefix. ( They use TRAEFIK_ which matches the traefik at the start of labels. )

Something that could be used to avoid breaking changes is to create a mapping between old and new configs and before sending the map to paerser, manually rename the variables to their new values. This would result in a lot of deprecated variables tho, so I am not sure which one would you prefer to go with.

pixxon commented 7 months ago

Had some time on my hands to look more into this and made a PoC to show how changes could be made. I moved config into a new package.

Other than normal changes, there are two significant workarounds:

First for the mapping of the old variables: https://github.com/pixxon/docker-volume-backup/blob/refactor-configuration/internal/config/util.go#L23

Second I had to process the "FILE" variables. https://github.com/pixxon/docker-volume-backup/blob/refactor-configuration/internal/config/util.go#L23 I went with something that I saw in linuxserver projects, where the prefix FILE__ is used and instead of being processed during reading of the variable, there is a preprocessing. ( It could be moved even more outside, into an init script like it is for them, but that might cause other problems. )

m90 commented 7 months ago

Thanks for starting to work on and no rush from my end. I'm a bit busier than usual so I didn't really dig into the code yet, however I wanted to check what you think of the following approach in building this:

I'm mostly just worried we spend a lot of time on reworking the existing approach when we (or at least I) don't really know how service discovery works in detail yet.

Let me know what you think.

pixxon commented 7 months ago

I do plan to add a new strategy there for volume labels. However to achieve that, the current configuration struct has to be changed or the name of the labels might be really weird. ( I could search for an alternative, but I found traefik paerser to be pretty handy for creating configuration out of labels. )

If I just use the current Config, that has everything in a flat order, the expected label names will be unreadable in my opinion.

--backupcronexpression=@daily
--backupstopduringbackuplabel=test

To have some more structure in it, I have to change the Config struct to contain more types. I might be wrong, but envconfig would not be able to load the nested members properly, since it concats the naming. ( So OUTER_INNER would happen. )

If you want as a prototype I can implement gathering config from labels even if they are a bit unreadable. And if it seems to be fine then I can do something about the structs to make the names nicer.

pixxon commented 7 months ago

Wrote some code to handle the volume discovery. It is very primitive and probably need some fine tuning for the future. Key points to look at:

Some issues that I ran into while making it:

m90 commented 7 months ago

Quick question after reading the issues you are describing: why is spawning a new container necessary in the first place? Up until now, this isn't done either. Is it about multi node swarm setups? In case yes, maybe there is a simpler, even if less smart solution to that problem (which affects a tiny.fraction of users anyways).

pixxon commented 7 months ago

My assumption is that volumes will be created after the backup container. Therefor they will not be mounted and I am not sure if it is possible to attach volumes to a running container. If we expect the user to mount the volumes into the container and redeploy it, then new backup schedule could be added by a new confd file.

m90 commented 7 months ago

Would it be possible to commit the current container and then run the scheduled backup off that committed image without having to copy over the entire configuration?

Alternatively, is there a way to create a container spec off the output of docker inspect with some changes instead?

pixxon commented 6 months ago

I had some look and I think using CopyFromContainer could allow the minimal copy required. It would need to spawn a new container that collects the backup from the volume, then copy the tar back to the manager. It would have more overhead but would allow us to not bother with networks and other resources.