therootcompany / greenlock.js

🔐 Free SSL, Free Wildcard SSL, and Fully Automated HTTPS for node.js, issued by Let's Encrypt v2 via ACME
https://git.rootprojects.org/root/greenlock.js
Mozilla Public License 2.0
63 stars 16 forks source link

Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong #1

Open alex996 opened 3 years ago

alex996 commented 3 years ago

Earlier this week (Monday, Apr 26 around 12:30 ET) Let's Encrypt was undergoing maintenance and its ACME v2 URL https://acme-v02.api.letsencrypt.org/directory was returning an error. I have greenlock-express set up with a valid cert (issued in March, expiring in June). I needed to restart Node but I got the following error:

Listening on 0.0.0.0:80 for ACME challenges, and redirecting to HTTPS
Listening on 0.0.0.0:443 for secure traffic
Ready to Serve:
     demo.example.com
ACME Directory URL: https://acme-v02.api.letsencrypt.org/directory
[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong
set greenlockOptions.notify to override the default logger
Error cert_order:
Cannot read property 'termsOfService' of undefined
TypeError: Cannot read property 'termsOfService' of undefined
    at fin (/path/node_modules/@root/acme/acme.js:74:23)
    at /path/node_modules/@root/acme/acme.js:95:12
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at Object.greenlock._acme (/path/node_modules/@root/greenlock/greenlock.js:393:9)
    at Object.greenlock._order (/path/node_modules/@root/greenlock/greenlock.js:421:20)
    at Object.greenlock._renew (/path/node_modules/@root/greenlock/greenlock.js:335:9)
    at Object.greenlock.get (/path/node_modules/@root/greenlock/greenlock.js:212:23)

It seems that greenlock pings the ACME endpoint every 1 hour, is that correct? From @root/greenlock/greenlock.js:387:

var dir = caches[dirUrl];
// don't cache more than an hour
if (dir && Date.now() - dir.ts < 1 * 60 * 60 * 1000) {
    return dir.promise;
}

await acme.init(dirUrl).catch(function(err) {
    // TODO this is a special kind of failure mode. What should we do?
    console.error(
        "[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong"
    );
    throw err;
});

I don't fully understand the intent here but my question is - if the cert is still valid (in my case, it's expiring in June), a. why is it necessary to ping the ACME endpoint, and b. why does this ping prevent the Node server from starting (again, despite a valid cert)?

Expected: given a valid cert, greenlock should start the Node server. Actual: given a valid cert, greenlock fails to start because ACME v2 endpoint is unavailable.

Packages:

Thank you.

coolaj86 commented 3 years ago

Why is it necessary to ping the ACME endpoint?

Fail early. If someone is starting the server with incorrect settings, we want them to know right away.

It seems that greenlock pings the ACME endpoint every 1 hour, is that correct?

No. It caches the directory URL so that it doesn't fetch it again for at least an hour (as opposed to every time it's needed).

Why does this ping prevent the Node server from starting (again, despite a valid cert)?

// TODO this is a special kind of failure mode. What should we do?

"In the face of ambiguity, refuse the temptation to guess."

I think that it would be reasonable to make the default behavior to log the error and to continue rather than throw, now that the use case is better understood.

alex996 commented 3 years ago

Thanks. IIUC, if we remove this throw statement:

// @root/greenlock/greenlock.js:393
await acme.init(dirUrl).catch(function(err) {
    // TODO this is a special kind of failure mode. What should we do?
    console.error(
        "[debug] Let's Encrypt may be down for maintenance or `directoryUrl` may be wrong"
    );
    // throw err; // <--- this
});

and the call to ACME v2 does fail, then the metadata won't be initialized:

// @root/acme.js:69
me.init = function (opts) {
// ...
    function fin(dir) {
      me._directoryUrls = dir; // <--- this won't run
      me._tos = dir.meta.termsOfService; // <--- and this
      return dir;
    }

Which means acme._orderCert will need to call init again:

// @root/acme.js:1145
ACME._orderCert = function (me, options, kid) {
// ...
    return U._jwsRequest(me, {
        url: me._directoryUrls.newOrder, // <--- this will be missing

Alternatively, we can ping ACME v2 periodically (every 1 hour?) until it is back up. That said, I'm not sure if me._directoryUrls and me._tos are used elsewhere as well.

I think I get the general idea, so I can write up a PR if this makes sense.

mikealeonetti commented 3 years ago

Is there a setting that would allow the server to start even though let's encrypt API is down for maintenance? I did have a valid cert also and had to restart node and now the server is just down. Would love to prevent this in the future.

eloquence commented 2 years ago

This appears to be biting me today during an LE outage - had to restart Node for unrelated reasons and now the site is just down. Definitely would be nice for this module to handle such situations more gracefully.

eloquence commented 2 years ago

While the main API endpoint is down I was able to bring my server back up by temporarily switching to the staging API directory endpoint (since the cert is still valid this did not appear to have any unintended side effects, for now).

coolaj86 commented 2 years ago

I'm convinced that this is a problem that needs to be solved. Would someone like to make a PR, test it, and ping me?