sasa1977 / site_encrypt

Integrated certification via Let's encrypt for Elixir-powered sites
MIT License
471 stars 34 forks source link

Strange certificate generated in production #29

Closed noozo closed 3 years ago

noozo commented 3 years ago

Hi Sasa, sry to bother you again (is there a forum where i can post questions, instead of here?).

Getting this certificate generated in production. Is this normal? Browsers don't like it, and the date in the future sounds like a development certificate. But the site was fine this morning. Anyone ran into this yet?

image

Cheers

axelson commented 3 years ago

When site_encrypt starts up (including in production), it generates a self-signed certificate. Then it attempts to fetch a certificate from Let's Encrypt. If that fetching fails then the self-signed certificate will continue to be used. Do you have any errors in the logs?

noozo commented 3 years ago

I didn't know about that behavior. Will check my logs.

noozo commented 3 years ago

There doesn't seem to anything other than Fatal - Certificate Unknown messages. Will attempt to restart.

noozo commented 3 years ago

Yeah, so it's saying that the certificate that it found is valid until 3020-06-01 and so it's not renewing.

noozo commented 3 years ago

I think it's running the ACME local server in prod. I need to confirm my config to see where i messed up. Thanks for the help.

noozo commented 3 years ago

Apparently i hit the max certificates limit on letsencrypt. Bummer.

sasa1977 commented 3 years ago

If local acme server is running in prod, then the client uses that, not letsencrypt. You can check if the local acme server is running by searching the log for Running local ACME server.

If you've hit the limit on letsencrypt, it means that you're actually not running a local acme server. The certificate you currently see is the self-signed certificate which is created during the first boot if there's no certificate. Based on your description, my first guess would be that either letsencrypt can't reach the site on the given domain(s).

You could try testing manually using staging letsencrypt, as explained here. In step 2 you should check that the server responds on all the domains you're registering. So e.g. if you listed foo.bar and www.foo.bar in the :domains field, then your Phoenix endpoint must be accessible via http://foo.bar and http://www.foo.bar.

Until you figure this out I advise not using the production letsencrypt server, because you'll hit the limit there again. Staging also has limits, but IIRC they are a bit more relaxed.

noozo commented 3 years ago

I think i found the problem. I started the site manually without passing the required env vars i have set on github actions and it was using the development certificate. I might have triggered the limits when moving the certificates to an external folder (moving away from priv). Thanks for the help diagnosing it :)

noozo commented 3 years ago

Btw, is there a way to temporarily use different certificates (other than letsencrypt) while the rate limits go away (i think it's one week). If i replace the files with a valid certificate found somewhere else, will that work?

sasa1977 commented 3 years ago

I might have triggered the limits when moving the certificates to an external folder (moving away from priv).

I'm not exactly sure why that would happen, unless you already made multiple registrations in previous attempts, and during the move you triggered yet another registration which tripped the limit.

Btw, is there a way to temporarily use different certificates (other than letsencrypt) while the rate limits go away (i think it's one week). If i replace the files with a valid certificate found somewhere else, will that work?

Officially no :-) However, I think it should work. The cert + key reside in "#{db_folder}/certs/#{hd(domains)}". In there you need to have privkey.pem, cert.pem, and chain.pem. This can be seen from the impl of https_keys.

If you replace these files with the valid ones, https should work. Note that once the certification succeeds, these files will be replaced. I'd also suggest setting the mode to manual until the limit clears, to avoid making needless requests to letsencrypt, (and possibly extending the limit).

Note that if some previous certification succeeded, you might already have valid letsencrypt cert files somewehere. If you didn't delete them, you can search for them on the disk, and use those instead of obtaining the cert elsewhere.

noozo commented 3 years ago

Yes, that's my plan. Thanks :)