tailscale / caddy-tailscale

A highly experimental exploration of integrating Tailscale and Caddy.
Apache License 2.0
396 stars 42 forks source link

Tailscale host is not cleaned up #15

Closed clly closed 5 months ago

clly commented 1 year ago

Caddy Version: v2.6.4 caddy-tailscale version: latest I think?

I'm running into the same issue that's described here but I can confirm I'm using the same authkey. Based on some experimentation, if the tsnet listener get's closed then an ephemeral node should be cleaned up from the tailscale control plane.

The caddy-tailscale plugin has a Destruct method but I don't think it's getting called because it's not implemented as a caddy.CleanerUpper?


Correction, it's exactly what's happening in #7. I mistook the node key for the authkey. I'm running this in Nomad with an ephemeral key so new deployments get a new node key. My experimentation still says that closing the Listener would logout the node from tailscale and allow the next deployment to use the same name.

So far I haven't been able to ensure that the CleanerUpper executes. If y'all know more about Caddy or can correct my idea please let me know!

willnorris commented 1 year ago

I've started using this with my custom caddy build for my personal website and am seeing the same thing. The caddy tailscale nodes are marked ephemeral, but are not getting cleaned up. Even without implementing caddy.CleanerUpper, the Tailscale control server is supposed to remove ephemeral nodes shortly after they disconnect, so I'll need to see what's going on. But I just wanted to confirm that it's not just you, and I'm looking into it.

clly commented 1 year ago

Thank you! The documentation that I saw said 30 minutes - 48 hours unless it's explicitly logged out. I was thinking the CleanerUpper could execute that logout.

https://tailscale.com/kb/1111/ephemeral-nodes/#how-long-before-ephemeral-devices-are-auto-removed

clly commented 1 year ago

I've done some investigation myself and what I've found is that caddy doesn't currently provide the same lifecycle hooks for Listeners right now that exist for actual modules (at least that I can find). This means that my original suggestion/idea to use a CleanerUpper won't work (at least not if you don't also use the TailscaleAuth plugin and have that execute the Cleanup). It wasn't really my use case so I kept digging and came up with a potential solution.

The change wraps the existing net.Listener and uses a WaitGroup to ensure that the tsnet.Server is closed when all the listeners get closed and closing the tsnet.Server as part of the listener Close. I've been able to confirm that the tsnet.Server shuts down correctly, ~but the Ephemeral node doesn't get removed from the tailscale control plane like the tshello demo seems to show that it should.~

I've figured out the node removal issue. Even though the node itself is marked as "Ephemeral" in the control plane, the server itself needs to know that it's Ephemeral for removal to happen immediately on shutdown. I'm not sure what the best way to plumb that configuration might be

I will put up a PR if you're open to it but I'm not super happy with the implementation because the control flow doesn't feel right but I'm not sure of a better way to do it. Very happy to hear your thoughts

clly commented 5 months ago

Closing this because the caddy config now allows ephemeral tailscale nodes