sasa1977 / site_encrypt

Integrated certification via Let's encrypt for Elixir-powered sites
MIT License
462 stars 33 forks source link

Certificate with clusters #61

Open hunterboerner opened 1 month ago

hunterboerner commented 1 month ago

How does this library behave when deploying an Elixir cluster behind a load balancer?

sasa1977 commented 1 month ago

This currently won't work, as the renewal process state is kept in-memory, so different letsencrypt requests would end up on different instances, and hence the renewal would fail. Even if it magically succeeds (e.g. if all requests happen to be routed to the single instance), the certificate would only be available on one node, since the cert is stored on disk.

To make this work, we need to introduce the contract for state storage (basically a behaviour), and then provide the implementation for a shared durable storage. Perhaps it would be enough to provide the disk-based backend, and then if all instances used some shared volume to access the state (and the certs), this would work.

This requires some more thinking, discussions, and the implementation. Contributions in these areas are welcome :-)

feld commented 3 weeks ago

If you're only doing HTTP validation you can cheat the response and allow it to be served from any node. I've done this with Caddy, Nginx, Varnish, etc. And then it helps if you have a way to share the final certificate across all nodes (S3, some other shared filesystem, a cache accessible across all Elixir nodes...)

You need to respond to requests at /.well-known/acme-challenge/TOKEN with a simple text response of TOKEN.THUMBPRINT. All you need to do is share the THUMBPRINT value across all nodes. The THUMBPRINT is computed from your private key used for ACME (not a TLS private key) and is not unique per request! Only the submitted TOKEN is unique.

I don't have a simple way to compute that THUMBPRINT value in Elixir as I currently use a Bash script with a bunch of functions stolen from another project, but it's certainly possible to do.

sasa1977 commented 3 weeks ago

All you need to do is share the THUMBPRINT value across all nodes.

Right, this is where the problem is. Currently this is not supported. Moreover, it wouldn't work in a split brain scenario, where a node is not visible in the cluster, but it can still serve requests.