Will no longer issue certificates.

dejecj commented 4 years ago

I have a large list of domains that need SSLs, so I started added them to the proxy and everything was fine until I started reaching higher numbers. At around 550 hostnames or so SSL verifications started failing with 404. See below:

[4/10/2020] [6:29:50 AM] [Nginx ] › ℹ info Reloading Nginx [4/10/2020] [6:29:52 AM] [SSL ] › ℹ info Requesting Let'sEncrypt certificates for Cert #986: dashboard.mmstrategics.com [4/10/2020] [6:29:59 AM] [Nginx ] › ℹ info Reloading Nginx [4/10/2020] [6:30:00 AM] [Express ] › ⚠ warning Command failed: /usr/bin/certbot certonly --non-interactive --config "/etc/letsencrypt.ini" --cert-name "npm-986" --agree-tos --email "undefined" --preferred-challenges "dns,http" --webroot --domains "dashboard.mmstrategics.com" Saving debug log to /config/log/letsencrypt/letsencrypt.log Plugins selected: Authenticator webroot, Installer None Obtaining a new certificate Performing the following challenges: http-01 challenge for dashboard.mmstrategics.com Using the webroot path /data/letsencrypt-acme-challenge for all unmatched domains. Waiting for verification... Cleaning up challenges Failed authorization procedure. dashboard.mmstrategics.com (http-01): urn:ietf:params:acme:error:unauthorized :: The client lacks sufficient authorization :: Invalid response from http://dashboard.mmstrategics.com/.well-known/acme-challenge/Iitm7O8rw-xvC68WvSaWDakTJ9lK50rfywpRU9ZECVc [35.223.222.80]: "<!-- <!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>404 Page Not Found</title>\n<style type=\"text/css\">\n\n:"

I don't understand exactly what's happening here since I already issued hundreds of certificates and everything was working perfectly. I did have to increate the hash_bucket and hash_size to go beyond 250 domains.

Please help.

mdisieno commented 4 years ago

Currently on V2.2.1. I'm getting the same issue as well with less than 10 certs so far. Might be a LE issue. Worth watching.

Cruv commented 4 years ago

Fresh install after I broke my container on an update and I am getting the same error.

Cruv commented 4 years ago

As an update to this, it seems it may not be an issue to this project at all. Going all the way back to v.1.6.0 the same error occurs. This is probably an issue on the Let's Encrypt side of things.

dejecj commented 4 years ago

I've been digging into this all day and I believe my particular issue is being cause by node running out of memory. It currently has 250mb and I can see that running out with a large hash table. Some domains work others don't so it's looking very likely that it's simply depending on the position of the host in the hash table. If it has to look too deep it runs out of memory and node kills it self.

Does that sound feasible? I am tring to figure out where node is initialized to give it more memory and test it out.

dejecj commented 4 years ago

I've been digging into this all day and I believe my particular issue is being cause by node running out of memory. It currently has 250mb and I can see that running out with a large hash table. Some domains work others don't so it's looking very likely that it's simply depending on the position of the host in the hash table. If it has to look too deep it runs out of memory and node kills it self.

Does that sound feasible? I am tring to figure out where node is initialized to give it more memory and test it out.

This was not the issue.

What I have found is that the proxy is letting requests to .well-known/ pass through to the destination server. So when let's encrypt makes that http request it doesn't see the key that certbot created in the proxy server. I don't know what caused that to start happening since I did 500 domains already and it worked fine.

jlesage commented 4 years ago

Are there more details in /config/log/letsencrypt/letsencrypt.log ?

dejecj commented 4 years ago

Are there more details in /config/log/letsencrypt/letsencrypt.log ?

Here you go:

letsencrypt.log

jlesage commented 4 years ago

What I'm suspecting is that because of the huge amount of domains, the time to reload the nginx config takes longer. And when certbot is invoked, nginx is not serving yet the challenge directory required by letsencrypt...

dejecj commented 4 years ago

What I'm suspecting is that because of the huge amount of domains, the time to reload the nginx config takes longer. And when certbot is invoked, nginx is not serving yet the challenge directory required by letsencrypt...

What are my options? Is there a setting to delay certbot until nginx is done reloading? I guess this is not a problem that's going away since the domain list will keep growing with time.

Any suggestions?

Cruv commented 4 years ago

@jlesage I'm getting the same problem on a fresh install however. Even the very first new domain request yields the "Command failed" error. I've tried all tags between 1.60 and latest. Before I had it working with 18 domains. The update I did today is what caused me to try a fresh install to see if it was something with the upgrade that caused the issue.

jlesage commented 4 years ago

Can you share a failed request from /config/log/letsencrypt/letsencrypt.log ?

dejecj commented 4 years ago

I also tried to switch the docker container to staging mode to spam a new server with domains without hitting rate limits, but there seems to be an issue with the certbot staging implementation. Something to do with a POST-as-GET request while finalize a certificate. See attached.

letsencrypt (2).log

Cruv commented 4 years ago

Here is a log from my fresh install. letsencrypt.log

dejecj commented 4 years ago

Just wanted to add another observation, I don't think the problem is the nginx reload. And this is because looking closer the log the html being return is that of the nginx default 404 page. If the config wasn't reloaded that would be the custom 404 html that I have for my web app. Am I correct in this thinking?

Cruv commented 4 years ago

My issue with this was resolved. During an update of OpenWRT my DNS updates weren't happening, causing Let's Encrypt to not be able to issue a cert.

semistatic commented 4 years ago

I'm also having this issue on a fresh install. When I copy/past the URL from the error log, I'm getting passed through to my proxied (destination) server and not NGINX, so a 404 is being returned.

daniel-jirca commented 4 years ago

I'm also having problems getting a certificate for one of my applications. The app works fine on port 80 though, so no firewall/port forwarding issues here .. letsencrypt.log

jlesage commented 4 years ago

It seems to have issue reaching the container:

certbot.errors.FailedChallenges: Failed authorization procedure. meet.REDACTED.design (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://meet.REDACTED.design/.well-known/acme-challenge/uZhQJU1hiCJYp8fTjhuR7U2TkMp9oVNsBVJqY11zEQM: Timeout during connect (likely firewall problem)

The app works fine on port 80 though, so no firewall/port forwarding issues here ..

What do you mean exactly? Do you mean that accessing http://meet.REDACTED.design (the same dns name as in the log file) brings you to your app? Are you testing from the inside or the outside of your network?

daniel-jirca commented 4 years ago

What do you mean exactly? Do you mean that accessing http://meet.REDACTED.design (the same dns name as in the log file) brings you to your app? Are you testing from the inside or the outside of your network?

Now that you mention it, I tried to reach the app from an external network and it timed out, although from the internal network it loads instantly. The url is http://meet.alecsa.design. The DNS record seems good, the name resolves with the same public ip whether I ping it from the private network or an external network. I also created some NAT rules and the app can be accessed directly on the port http://meet.alecsa.design:8000 but it doesn't work through NPM. I'm also notificing the same errors in the NPM logs when auto renewing existing certificates. I tried two configurations in NPM:

proxy to the host internal ip;
proxy to the docker container through the docker network. Both npm and the target container are on the same docker network and can reach each other through the virtual network - tried telnet on the open ports from inside the containers. Unfortunately I cannot obtain a certificate with any of these configurations and the app is not accessible from the internet through NPM.

jlesage commented 4 years ago

You also need NAT rules for ports 80/443 that will reach NPM. Is this something that you did ?

daniel-jirca commented 4 years ago

Yes of course. I wrote that in the post above:

I also created some NAT rules and the app can be accessed directly on the port http://meet.alecsa.design:8000 but it doesn't work through NPM.

It looks like these hosts cannot be accessed from the outside through npm. NAT works, access from the internal network through npm works. Just the traffic from the internet seems to be a problem and this is probably the reason why certificate generation is not possible, as LE tries to validate the domain with a request from the internet.

jlesage commented 4 years ago

You said that you created a NAT rule for port 8000. But I'm talking about ports 80 and 443.

You can use https://www.yougetsignal.com/tools/open-ports/ to check if ports 80/443 are accessible from the internet.

jlesage / docker-nginx-proxy-manager

Will no longer issue certificates. #68