kshcherban / acme-nginx

python acme client for nginx
GNU General Public License v3.0
314 stars 41 forks source link

Some times nginx -s reload detaches long before nginx is actually reloaded #63

Open Goury opened 2 years ago

Goury commented 2 years ago

Subj, very often resulting in a challenge validation failure.
Should check if nginx was actually reloaded and wait until it actually is ready before calling for challenge validation.

Bonus feature: process multiple certificates at once, request all the challenges for all the domains, set one big nginx config and reload it once, then call for challenge validation for each cert/domain.
This would save a lot of time for when I need to update lots of certificates on a slow nginx config.
Also accept something like JSON configs instead of command line.

Long story: I have a server configured to do some DNS requests on start/reload for some corporate/mysterious reasons.
It takes 10-20 seconds for nginx to reload, which is not a big deal normally.
But I also have a couple of dozens of different domains served by this server and running acme-nginx takes 15-20 minutes to update em all.
But it could be done in under a minute.

I sure can set up something to run an instance of nginx with simpler config so it doesn't take 20 seconds to reload.
But even then, updating each cert one-by-one takes a lot more time than it could.
And running multiple acme-nginx in parallel, sadly, is not an option.
So I'd rather fix this myself than do some nginx magic.

Extra bonus thought: what if we make it slightly more complicated to install acme-nginx by requiring an extra line in each nginx's donain-specific config, but then don't reload it at all?
That sounds like the perfect solution to me, TBH.

PS: I bet I'll fork acme-nginx around this weekend.

kshcherban commented 2 years ago

Hello @Goury, regarding your complain about slow validation of multiple domains. Do you know that you can specify multiple domains at once by repeating --domain parameter? Please check this code.

This will create only 1 nginx config with all needed domains and reload nginx only once.

If you have too many domains, you can split them in chunks, that will definitely speedup the process.

Goury commented 2 years ago

Yup, I know it and I use it a lot, such a lovely feature. But I often need many separate certificates and it doesn't seem to be working like that.

creolis commented 4 months ago

Hey :)

Same issue here, our nginx reloads take quite a long time, but it seems that acme-nginx does not really wait for the reload to be finished, which leads to failed validations. Didn't check the code yet, but it seems like "sleep as synchronize" was used there.

This issue is from 2022, is the project still maintained in any way?

kshcherban commented 4 months ago

@creolis this project is maintained at the best effort. It was never tested on big configurations as I wrote it for myself while using few domains only.

https://github.com/kshcherban/acme-nginx/blob/master/acme_nginx/Acme.py#L108-L119 code is not waiting for anything, it just execs nginx reloads and checks it's exit code.

Are you sure that nginx reloads asynchronously? Is there any way to check that reload finished?

Goury commented 4 months ago

@kshcherban I can confirm that this is what's happening. nginx -s seems to be signaling master nginx process to start doing something and exits, but the actual master process may take some time to load or even fail to do so. The saddest part is that it seems so there's no built-in way to check if nginx is actually working. You can check if the process is running, but that doesn't mean it has finished loading. You can check if the network port is open, but it's not a guarantee too. You can try to analyze systemd's status output, but I feel this is not a good way.

I feel the best way is to attempt to query something over http and retry if it fails. I'd do it like that:

  1. nginx -s reload
  2. wait 100ms
  3. query
  4. if query fails, wait another second and query again
  5. if it fails again, wait another three seconds and do it again
  6. if it fails again, wait another ten seconds and do it again
  7. if it fails again, declare it a failure and move on
creolis commented 3 months ago

hmmm .. let's see if we can come up with some solution for this, I'm going to check if there is some sort of signal we can work with