[Bug] SSL LE400 gone after nginx service restart

Scan4u commented 2 years ago

Describe the bug

Hi! I'm running Hestia CP on Ubuntu 20.04 (DNS cluster of 4 NS, 125 zones, 2300 ns records) After the last update to the v.1.6.7 - I started to receive the LE400 error on SSL requests for all new added domains. The problem with LE SSL is solved every time after the manual restart of the nginx service from Hectia CP - but only for the first few SSL requests. Than all get back to LE400 error till the next manual restart.

In the /var/log/hestia/nginx-error.log i found lot of the similar errors - check the sample: 2022/08/18 11:05:27 [error] 1034#0: *218 FastCGI sent in stderr: "PHP message: PHP Warning: Undefined array key "v_nginx_cache_duration" in usr/local/hestia/web/edit/web/index.php on line 255PHP message: PHP Warning: Undefined variable $v_letencrypt in /usr/local/hestia/web/templates/pages/edit_web.html on line 340" while reading response header from upstream, client: 2xx.x4x.1x7.xx0, server: _, request: "POST /edit/web/ ?domain=domain.tld&token=e137e3bbe956312da37fd124c20120bc HTTP/1.1", upstream: "fastcgi://unix:/var/run/hestia-php.sock:", host: "hestia.server.tld:8083", referrer: "https://hestia.server.tld:8083/edit/web/?domain=domain.tld&token=e137e3bbe956312da37fd124c20120bc" The /var/log/hestia/LE-user-domain.tld.log error: `==[Step 5]==

status: 400
nonce: 0102xxvbA1QRsVNVDuyxxxxxxACAsS5lDgDxxdl_c2-4
validation:.
details: Unable to update challenge :: authorization must be pending
answer: HTTP/2 400 server: nginx date: Thu, 18 Aug 2022 08:29:57 GMT content-type: application/problem+json content-length: 144 boulder-requester: 515111127 cache-control: public, max-age=0, no-cache link: https://acme-v02.api.letsencrypt.org/directory;rel="index" replay-nonce: 0102f2nvbA1xxxxxxxxxxxxxxS5lDgD4Midl_c2-4 ^M { "type": "urn:ietf:params:acme:error:malformed", "detail": "Unable to update challenge :: authorization must be pending", "status": 400 }

==[Abort Step 5]== => Wrong status`

Please advise..

Tell us how to replicate the bug

-

Which components are affected by this bug?

(Backend) Web Server (Nginx, Apache2), Let's Encrypt SSL

Hestia Control Panel Version

1.6.7

Operating system

Ubuntu 20.04

Log capture

jaapmarcus commented 2 years ago

Please check https://docs.hestiacp.com/admin_docs/web/ssl_certificates.html#error-let-s-encrypt-validation-status-400

I need the url listed in:

{ "type": "http-01", "status": "pending", "url": "https://acme-v02.api.letsencrypt.org/acme/chall-v3/12520447717/scDRXA", "token": "9yriok5bpLtV__m-rZ8f2tQmrfeQli0tCxSj4iNkv2Y" }

In working order or at least the error it self.

PHP errors have noting to do with it

Scan4u commented 2 years ago

Here it is:

зображення

Scan4u commented 2 years ago

After manual restart of the nginx service the error for mentioned on screenshot domain was gone and it successfully got the SSL certificate.. Have no idea what is going wrong..

jaapmarcus commented 2 years ago

So a 404 request so it looks like nginx isn't loading properly

Maybe max files open?

Scan4u commented 2 years ago

зображення

jaapmarcus commented 2 years ago

I have no idea so far this issue only happens when there are an a lot of domains present. We could offcourse check if the file is probally loaded by sending an request to http://domain.com/.well-known/xxxxxxx and see if it works

Scan4u commented 2 years ago

зображення

This is the result of http://domain.com/.well-known/xxxxxxx

Once again - after the manual restart all going nice for one or two new SSL requests. Than all stuck again till the next nginx service reload.

jaapmarcus commented 2 years ago

Token is provide with the request we receive from LE so that is not the issue.

But I still wonder what is going wrong. As told I don't have any issues like this but also not that many domain active on a server

Scan4u commented 2 years ago

Yes, the most interesting for me - what is changed in the nginx service after the first successful SSL request - so all next domains are getting LE 400 error again till the next nging service restart.. The nginx service is the main suspect :) And this issue is 101% reproducible. All other things are doing well on the server..

Scan4u commented 2 years ago

зображення

Undefined array key "v_nginx_cache_duration" error still appears during bad LE400 SSL requests.

jaapmarcus commented 2 years ago

The error has nothing to do with letsencrypt it is probally not set with nginx + apache

jaapmarcus commented 2 years ago

https://github.com/hestiacp/hestiacp/pull/2854

Will improve the error output of LE and when it fails

x-o-r-r-o commented 2 years ago

well on my end even after nginx restart i am not able to get ssl renewal. I have 7 domains running on hestia and since this current issue everything was working fine and all my domains were getting ssl renewal via cron as required but now all dmain are stuck with error 400. nothing changed no template replacement everything is default and as it is since start of the fresh install. Anyone have any idea how we can fix this?

ScIT-Raphael commented 2 years ago

u had aswell a le400 this night, manually triggered v-update-letsencrypt which runned trough. Could be a temporary problem with lets encrypt. For further debugging, have a look at our docs.

hestiacp / hestiacp