lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.63k stars 2.07k forks source link

[feature]: graceful rotation / hot reload of the TLS certificate #8340

Open openoms opened 8 months ago

openoms commented 8 months ago

Problem Description The current behavior of LND is to delete and recreate the TLS certificate upon expiration at the next restart. This disrupts connections as the new TLS secret isn't immediately synced with all connected applications. It also needs LND to be restarted which is operationally inconvenient.

Desired Solution To minimize downtime and maintain connections without interruption LND should be able to dynamically load a new TLS certificate without needing a full restart.

*Alternatives considered In Kubernetes environments, managing TLS certificates externally via Terraform is feasible but still necessitates an LND restart. If a hot reload would be possible LND could be notified with a script running in CI or in a sidecar container.

Additional context Our environment is GCP configured with Terraform from Helm charts in Concourse CI.

Roasbeef commented 8 months ago

In Kubernetes environments, managing TLS certificates externally via Terraform is feasible but still necessitates an LND restart.

For this we use config maps, then notifications to recycle other pods if the relevant config map changes.

I think hot swapping could be more trouble that it's worth, as now you potentially have a consistency issue: some connections are using the old cert, while some are using the new cert. If these applications can somehow detect that a new cert is being used, can't they also switch to the hotswap using the exact same mechanism?

pseudozach commented 2 months ago

I came here to open this issue as well.

Current behavior basically ensures service disruption, is it not possible to have an optional feature flag that rotates these 2 files x amount of time before they expire? This would solve the issue for most casual/single-node users like myself.

Happy to send a PR but I'm surprised this isn't a big problem for all the teams that are maintaining many nodes.

Roasbeef commented 2 months ago

Happy to send a PR but I'm surprised this isn't a big problem for all the teams that are maintaining many nodes.

PR SGTM 👍. How do plan on handling the invalidation issue I pointed out above?