zyclonite / zerotier-docker

ZeroTier One as Docker Image
MIT License
305 stars 74 forks source link

[howto] How to swap a running zerotier container with a new version of the image ? #25

Open softlion opened 9 months ago

softlion commented 9 months ago

Hello !! Thank you for that project, it's so useful.

I want to update the zerotier container running on a remove device.
The issue is, I'm using the zerotier network to ssh into that device .

That mean I can't stop the container and restart it, as I will loose connectivity. What would be your best suggestion ?
I could create a new zerotier container, but I don't know if it will work as the other one will still be running.

Side note: I will also add a watchtower label to it, so it's updated automatically in the future.

Paraphraser commented 9 months ago

Well, I can't speak for anyone else's experience but in my case, one of my zerotier-router instances is 1200km away supporting a non-tech-savvy relative. It's a Pi 4B running from SD and I definitely "measure twice, cut once" every time I need to do any kind of maintenance work.

Thus far, however, my concerns have proven unfounded. I can apt update ; apt upgrade and even when the list of changes is going to affect running containers (eg container.io), the most that happens is a small stutter (during which I definitely hold my breath) and then it comes good.

In terms of updating the zerotier-router container, a docker-compose pull followed by a docker-compose up -d gets the job done. That causes a much shorter stutter (probably as the iptables rules are withdrawn, the old container is torn down, the new container starts, and the iptables rules are recreated) but has otherwise never missed a beat.

Worst case would probably be if the updated container simply didn't work but I always update the local instance first and make sure it can still communicate with the remote, before I upgrade the remote.

Even occasional reboots or power failures at the two ends have never caused a problem. Rock solid.


Never been a fan of watchtower - and precisely because of that "worst case" scenario I mentioned above. To be perfectly honest, imagining my remote comms going haywire one day, me chasing my tail for a few hours, only to then realise that a watchtower-like service had installed a broken update - the stuff of nightmares. Talk about self-inflicted wounds. I'd kick myself seven ways into next week.

I'd far rather do a manual "pull" so I always know what's about to change, then "up" containers one at a time so if anything breaks, I have a handle on the likely culprit.

In case you're interested, my actual approach is:

$ docker-compose pull
$ docker images

Then, if a new image for (say) zerotier-router has come down, the pattern is going to be:

REPOSITORY          TAG     IMAGE ID       CREATED        SIZE
zyclonite/zerotier  router  22a52abc6835   10 days ago    23.2MB
zyclonite/zerotier  <none>  64f3163f7248   4 weeks ago    23MB

where the image with the router tag is new but isn't yet running, while the image with the <none> tag is old but is what is currently running.

Then I tag the old image by hand to keep it hanging around (so it won't be removed by a prune):

$ docker tag 64f3163f7248 zyclonite/zerotier:router-prior

Then I do the "up", after which the new image is running. If anything goes hinkey, I can change the service definition in docker-compose.yml to be:

    image: "zyclonite/zerotier:router-prior"

and "up" again. I don't need to go to Dockerhub to figure out explicit version tags or pull older images. Assigning my own tags is the quickest and most reliable way to revert a Docker container that I know of.

Once I'm satisfied the new container is working properly, I can get rid of the old image:

$ docker rmi zyclonite/zerotier:router-prior

Don't know if any of that helps...

zyclonite commented 9 months ago

to not lose connectivity you could look into something like a multipath setup

hoppke commented 9 months ago

Stopping/recreating the container should work fine - I do it like that and have had no issues (yet) :) The container gets destroyed, connectivity is lost, but then the new container comes up and things reconnect; the effect is that the ssh session pauses for a bit and then resumes. When you're not familiar with the tooling involved (or have limited trust) it's common practice to run the update process in a "screen" session; that way even if the ssh session terminates the commands will still have a terminal to run in, with uninterrupted input/output streams.

You can also prepare a post-mortem autorecovery; schedule something to reinstate the old working config (via cron or a one-off "sleep 20m; do-the-magic-to-restore-old-container"). If everything works out you just cancel it, but if you lock yourself out you can hope it'll trigger after a while and open the door for you.

With zerotier containers you could go for a blue/green sort of setup I suppose. Have 2 (or more) containers, only update one at a time. I had a setup where I had zerotier running on more than one node in a remote network. One of those zerotier nodes would act as a gateway for meshing that network with the rest of my stuff. Which node was currently the gateway was decided by reassigning a special IP in zerotier central.

Oh, and you could also consider running that second ingress channel through a separate product. E.g. tailscale. Opens up new surface area for malicious actors, but also offers some protection should the zerotier network go out for external reasons.