offen / docker-volume-backup

Backup Docker volumes locally or to any S3, WebDAV, Azure Blob Storage, Dropbox or SSH compatible storage
https://offen.github.io/docker-volume-backup/
Mozilla Public License 2.0
1.98k stars 84 forks source link

Containers that provides network service to other containers should be backed up first #160

Open littlegraycells opened 2 years ago

littlegraycells commented 2 years ago

Describe the bug Currently, it seems like the sequence of stopping and starting containers during backup is somewhat arbitrary. This causes an issue if one or more of the containers are using another container as the network provider and the dependent container is started before the network provider container. In this case the dependent containers fail to start.

To Reproduce Steps to reproduce the behavior:

  1. Create 3 containers - A, B & C in a docker-compose file
  2. Set network to service:A for container B & C
  3. Run the backup

Expected behavior Container A is stopped, backed up and restarted. Then containers B & C should be stopped and started again.

Desktop (please complete the following information):

Additional Context The Steps to reproduce above may or may not work depending on whether container A is down when either B or C is being restarted. I'm not quite sure how to have it fail reliably.

m90 commented 2 years ago

I didn't even know you can use a container as a "network provider" and I also cannot find any info about this in here: https://docs.docker.com/compose/networking/

How does this compare to using an explicitly declared network that you put your containers in? Like this, stopping and restarting should work as you expect it. Would this be a viable workaround for your situation?


As for the root cause of what you describe it seems as if Docker will give no guarantees about the ordering of the containers returned here: https://github.com/offen/docker-volume-backup/blob/00c83dfac79af6f03c677e187b5bce6817b2c2a7/cmd/backup/script.go#L268-L274

which causes the behavior you describe. I have a hard time imagining some mechanism that sorts these correctly based on their dependencies (this is pretty complex and might involve more than just networks) without blowing up complexity a whole lot. If this was to be supported I guess the way to go is having users label their services with some sort of priority value that is then used to sort the container before starting and stopping.

littlegraycells commented 2 years ago

Unfortunately for my use case a separate network won't work.

You can see the reference for this network mode here and here. One example of using this when you have a container that connects to a VPN and other containers connected to it have all their traffic routed through that VPN connection. Example Wireguard image that can be used for something like this.

You're right though. I had a quick look at the docs and there doesn't seem to be an easy way to get the dependencies with the docker cli, mostly because depends_on etc. is compose syntax. docker-compose does do it internally (reference).

The best way I can think of is similar to what you suggested. Either have a label indicating the priority or a label/environment variable that's something like offen.depends_on=container_name, which could then be used to prioritize the backup sequence internally. This will also avoid having to have a priority label on every container.

PS. There are other reasons to sequence the backups besides the container mode networking. For example, if you have a database container and application containers that use it, you would want to make sure the database container is started up before the application containers come online. The offen.depends_on label would be useful in this scenario as well.

m90 commented 2 years ago

Implementing a depends_on sounds nice, but I think it also brings a lot of semantic ambiguities, so if this was to be implemented I would prefer a plain numeric value (alphanumeric maybe?) in docker-volume-backup.start_priority and docker-volume-backup.stop_priority which lets people declare the same behavior in a pretty predictable way.

Like this, all that would need to be added is a sorting of the containers returned by ContainerList based on the values in these labels before starting and stopping. If none are present, sorting will not be changed and the current behavior is kept.

If anyone wants to work on this I am happy to provide feedback and merge PRs.

HomelabHaven commented 5 months ago

I recently ran into this issue as I have a container that depends on the network of another container for a vpn connection. The entire backup process was failing every time for two weeks because of this... glad I caught it in the logs. :O

Lyxon1337 commented 1 month ago

Same for me... Tailscale(VPN) all with "network_mode: YX" have this problem All: "Error response from daemon: cannot join network of a non running container"

does anyone know a temporary solution? like autoheal for stopped container? dunno

m90 commented 1 month ago

If someone could dig up the part of the source code for compose that defines the start/stop order for services/containers, that would be interesting now that compose is also written in Go. Maybe it's possible to do what is already done with some parts of the docker CLI and directly reuse code from there in this tool.