littlebizzy / slickstack

Lightning-fast WordPress on Nginx
https://slickstack.io
GNU General Public License v3.0
629 stars 112 forks source link

Certbot has problems when ipv6only enabled in Nginx server blocks #78

Closed Bronislawsky closed 2 years ago

Bronislawsky commented 3 years ago

FROM Fresh Install

There is something wrong with Certbot / letsencrypt simlink


lrwxrwxrwx 1 root root    58 Dec  6 11:58 cert.pem -> /etc/letsencrypt/live/domain.xyz/cert.pem
lrwxrwxrwx 1 root root    59 Dec  6 11:58 chain.pem -> /etc/letsencrypt/live/diomain.xyz/chain.pem
lrwxrwxrwx 1 root root    63 Dec  6 11:58 fullchain.pem -> /etc/letsencrypt/live/domain.xyz/fullchain.pem
lrwxrwxrwx 1 root root    61 Dec  6 11:58 privkey.pem -> /etc/letsencrypt/live/domain.xyz/privkey.pem

/etc/letsencrypt/live# ls -l

drwxr-xr-x 2 root root 4096 Dec  6 11:58 domain.xyz-0003

I manually set SSL_TYPE to certbot because the wizard doesn't set it then I re-ran ss-install

and within the nginx.conf also, it doesnt seem right as it doesnt point to any letsencrypt files

Bronislawsky commented 3 years ago

_From Fresh install again here is what I notice

I have this error :

2020-12-11 20:40:00 (170 MB/s) - ‘/tmp/default’ saved [22330/22330]

chown: cannot access '/etc/nginx/sites-available/default': No such file or directory Restarting nginx (via systemctl): nginx.serviceJob for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details. failed! Installing (or renewing) free SSL certs from OpenSSL and Certbot... Generating a RSA private key .............................+++++ ...................+++++ writing new private key to '/etc/ssl/nginx.key'

The Cert fails because nginx isnt started

after a reboot, when I ran ./ss-encrypt, I got this error again

IMPORTANT NOTES:

I will run the script line by line and check where it fails....after I get some sleep._

Bronislawsky commented 3 years ago

I have found out why it failed..

Certbot prefert IPv6, so when AAAA records are set it used it and I guess there is something wrong in the nginx ipv6 redirect that creates an endless loop which leads to a fail..

anyways.. Deleting the AAAA records fixed it ( I believe because I haven't been able to try, too many requests but the --dry-run works )

Bronislawsky commented 3 years ago

in /etc/nginx/sites-avalaiable/DOMAIN.xyz

by uncommenting this line

listen [::]:443 ssl http2 ipv6only=on;

ipv6 responds without http cpde 301

I don't know why its been commented, but that seems to solve the certs issueing issue with ipv6

**Using the webroot path /var/www/html for all unmatched domains. Waiting for verification... Cleaning up challenges

IMPORTANT NOTES:

jessuppi commented 3 years ago

Thanks for your research and reporting @Bronislawsky

Several days ago we changed from using the default server block file to using explicit server block names, e.g. example.com, staging.example.com and dev.example.com and then immediately afterward, I saw fatal errors with IPv6 and proceeded to comment out those same lines from our Nginx server block boilerplates that you noticed were problematic.

Certbot prefert IPv6, so when AAAA records are set it used it and I guess there is something wrong in the nginx ipv6 redirect that creates an endless loop which leads to a fail.. anyways.. Deleting the AAAA records fixed it ( I believe because I haven't been able to try, too many requests but the --dry-run works )

Interesting, I didn't know that Certbot prefers IPv6, that seems strange but great job discovering this!

Anyway I don't know why IPv6 was causing fatal errors on the server that I tested, it was an "old" SS installation that was recently updated via ss-update and then ss-install-nginx using the new boilerplates.

Maybe for fresh installs, that conflict can't be replicated, I'm not sure. It also could have been a fluke case...

Bronislawsky commented 3 years ago

Also, On most sites, I turn off dev & staging

STAGING_SITE="false" DEV_SITE="false"

Would it be possible not to run certbot on staging and dev when set to false because I believe if DNS aren't set pointing to the box this will fail and count as 2 fails which decrease the weekly limit for that particular domain.. which I believe its 50 per week.

jessuppi commented 3 years ago

I think there are 2 different issues here. The first is that Nginx had fatal errors with IPv6 enabled, but this is possibly related to the fact it was labeled a "default" server, but the old "default" block hadn't been deleted yet from my test server.

The other issue seems to be Certbot verifying the server properly over IPv6. I'm not sure if this is actually broken or not yet as I haven't had time to test... perhaps it is working fine if the former Nginx issue is addressed...

jessuppi commented 3 years ago

Alright so after some quick tests it was ipv6only=on on the new staging and dev server blocks that was causing Nginx to fail, even though the production server block seemed to work fine with that setting.

I might be a few years behind on that feature as it seems from my few minutes of research that Nginx might have changed the functionality of that snippet in the past few years or something...

Ref: https://serverfault.com/questions/638367/do-you-need-separate-ipv4-and-ipv6-listen-directives-in-nginx

For now I've removed ipv6only=on from all SlickStack server blocks, so the conflict should be resolved for now. I'll research more when I have time and post back on this thread.

jessuppi commented 2 years ago

Future readers can pretty much ignore my earlier comments on this Issue, here is what matters:

On top of all this, I found that we should add default_server to the IPv6 listen directives in our "catch-all" server block, since previously we only defined that on the IPv4 listen directives, now it's like this:

server {
    listen 80 default_server;
    listen 443 ssl default_server;
    listen [::]:80 default_server;
    listen [::]:443 ssl default_server;
    server_name _;
    return 301 https://@SITE_DOMAIN$request_uri;
}

Ref: https://github.com/littlebizzy/slickstack/commit/ebcfe0f96c041e9b1f4dc169737ec374e0e238b7

Even until this day I see tons of questions and blog posts about these settings, so apparently the entire world has been sufficiently confused by them, not just us! That makes me feel slightly better, but we were behind the times...

Ref: https://serverfault.com/questions/512054/globally-setting-ipv6only-off Ref: https://serverfault.com/questions/578648/properly-setting-up-a-default-nginx-server-for-https

jessuppi commented 2 years ago

On a side note, SlickStack has still been having trouble with Certbot on fresh installations, requiring users to run ss-install twice instead of once... I'm not sure if this is related.

Certbot does prefer IPv6 if those DNS records exist, but if they don't exist, this problem still seems to happen. I'm wondering if their software scans the server for IPv6 or something which caused our Nginx "catch-all" to fail previously or something, but even still the OpenSSL cert should already exist and be active before that... worth mentioning though.