jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

Kube-lego requests/receives a previously held certificate when a new domain fails eventually hitting a rate limit #148

Open znorris opened 7 years ago

znorris commented 7 years ago

This rate limiting could be prevented by not requesting/issuing a certificate when a domain fails authorization.

Note: This bug may be a function of the ACME protocol. If so this would need to be directed to the ACME RFC.

Replication Steps:

  1. Create an ingress controller

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
    name: example-ingress
    annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    spec:
    tls:
    - secretName: stage-example-com-ssl
      hosts:
        - stage.example.com
    rules:
    - host: stage.example.com
      http:
        paths:
        - path:
          backend:
            serviceName: webserver
            servicePort: 80
  2. The ingress controller should have requested a new cert for this domain and installed it in the GCE LB. Test to ensure it's working.

  3. Add a host and rule for a domain that ACME cannot authorize. The reason why it cannot be authorized isn't important for this test case. You can pretend the DNS for example.com isn't directing traffic to the correct place.

    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
    name: example-ingress
    annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "gce"
    spec:
    tls:
    - secretName: stage-example-com-ssl
      hosts:
        - stage.example.com
        - example.com
    rules:
    - host: stage.example.com
      http:
        paths:
        - path:
          backend:
            serviceName: nginx
            servicePort: 80
    - host: example.com
      http:
        paths:
        - path:
          backend:
            serviceName: production-nginx
            servicePort: 80
  4. Note that kubelego picks up the new domain. time="2017-04-12T23:19:07Z" level=info msg="process certificates requests for ingresses" context=kubelego time="2017-04-12T23:19:07Z" level=info msg="cert does not cover all domains" context="ingress_tls" domains=[example.com stage.example.com] name=example-ingress namespace=default

  5. Note that kubelego requests certificate: time="2017-04-12T23:19:07Z" level=info msg="requesting certificate for example.com,stage.example.com" context="ingress_tls" name=example-ingress namespace=default

  6. Note the original domain is successfuly authorized: time="2017-04-12T23:19:08Z" level=info msg="authorization successful" context=acme domain=stage.example.com

  7. Note the new domain we just added, that we expected to fail, does indeed fail: time="2017-04-12T23:20:20Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '404'" context=acme domain=example.com time="2017-04-12T23:20:20Z" level=warning msg="authorization failed for some domains" context=acme failed_domains=[example.com]

  8. This is what I didn't expect. time="2017-04-12T23:20:20Z" level=info msg="successfully got certificate: domains=[stage.example.com] url=https://acme-v01.api.letsencrypt.org/acme/cert/foobar" context=acme

We just got a certificate we didn't really request and at first I thought it was better than nothing. But because we still don't have a certificate that covers all the hosts listed in our ingress controller we start the entire process over again.

  1. Fast-froward several more requests until we've re-requested the same cert 5 more times (6 certs for the same set of domains in total) and we receive this error: time="2017-04-12T23:45:35Z" level=error msg="Error while process certificate requests: error getting certificate: 429 urn:acme:error:rateLimited: Error creating new cert :: too many certificates already issued for exact set of domains: stage.example.com" context=kubelego

Proposal

We only request/receive a certificate for all of the domains we requested. This prevents our "Duplicate Certificate" limit counter from increasing. Which is great for kube-lego users because we get a chance to correct our mistakes without hitting our rate limits.

If I've misunderstood something regarding kube-lego or the ACME protocol please let me know!

simonswine commented 7 years ago

I think you are right, according to their rate-limits a duplicated cert issues hits when:

We also have a Duplicate Certificate limit of 5 certificates per week. A certificate is considered a duplicate of an earlier certificate if they contain the exact same set of hostnames, ignoring capitalization and ordering of hostnames. For instance, if you requested a certificate for the names [www.example.com, example.com], you could request four more certificates for [www.example.com, example.com] during the week. If you changed the set of names by adding [blog.example.com], you would be able to request additional certificates.

We need to check if the certificate we can get, will be exactly the same like we already got, in that case we shouldn't request the certificate in the end

mithrandi commented 7 years ago

I think requesting a partial certificate when not all domains could be authorized is useful, since it means that only the failing domain(s) will be affected while the other domains will continue to work; avoiding issuing exact duplicate certificates (except when a renewal is required, of course) should be sufficient to avoid this rate limit.

Note that there is another rate limit of 5 failed authorizations / hour that will be hit in a scenario like this, but that isn't as severe, and I believe #92 covers that issue.