jetstack / kube-lego

DEPRECATED: Automatically request certificates for Kubernetes Ingress resources from Let's Encrypt
Apache License 2.0
2.16k stars 267 forks source link

[gce] DNS never resolves in the kube-lego container once it fails: reachabily test failed: no such host #162

Closed ahmetb closed 7 years ago

ahmetb commented 7 years ago

When I create a brand new ingress that has the kubernetes.io/tls-acme: "true" annotation on, I don't know the IP address that will get allocated to it. So I create it first, then grab the IP address.

But the problem is that, kube-lego container (or the kube-dns which I can't see, because I'm on GKE) starts making a request to my hostname (ngx.alp.im) right away, before I can even configure my DNS provider to point to this IP address. And this somehow gets cached:

time="2017-04-21T19:13:40Z" level=warning msg="authorization failed after
1m0s: reachabily test failed: Get http://ngx.alp.im/.well-known/acme-challenge/_selftest:
dial tcp: lookup ngx.alp.im on 10.3.240.10:53: no such host" context=acme domain=ngx.alp.im

This looks like NXDOMAIN error from kube-dns. Weird enough, the URL provided above works on:

  1. my browser on my workstation
  2. in a different container in the same cluster (kubectl run -i -t alpine --image=alpine --restart=Never then use curl)

I tried deleting the kube-lego container and after 20 minutes, it still gets no such host whereas the same hostname works outside the cluster and inside the cluster (but in another container).

Any ideas?

ahmetb commented 7 years ago

Looks like kube-dns container is in fact visible to me on GKE. When I removed it, things actually progressed and failed somewhere else. (#160) See marker below on when I killed kube-dns:


time="2017-04-21T21:06:58Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: Get http://n.alp.im/.well-known/acme-challenge/_selftest: dial tcp: lookup n.alp.im on 10.3.240.10:53: no such host" context=acme domain=n.alp.im
time="2017-04-21T21:06:58Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:06:58Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:06:58Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:06:58Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:06:58Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:06:58Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

time="2017-04-21T21:08:09Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: Get http://n.alp.im/.well-known/acme-challenge/_selftest: dial tcp: lookup n.alp.im on 10.3.240.10:53: no such host" context=acme domain=n.alp.im
time="2017-04-21T21:08:09Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:08:09Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:08:09Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:08:09Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:08:09Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:08:09Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

time="2017-04-21T21:09:32Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: Get http://n.alp.im/.well-known/acme-challenge/_selftest: dial tcp: lookup n.alp.im on 10.3.240.10:53: no such host" context=acme domain=n.alp.im
time="2017-04-21T21:09:32Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:09:32Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:09:32Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:09:32Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:09:32Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:09:32Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

time="2017-04-21T21:11:03Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: Get http://n.alp.im/.well-known/acme-challenge/_selftest: dial tcp: lookup n.alp.im on 10.3.240.10:53: no such host" context=acme domain=n.alp.im
time="2017-04-21T21:11:03Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:11:03Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:11:03Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:11:03Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:11:03Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:11:03Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

<!--  ----------- HERE I KILLED kube-dns and dns errors stopped ----------- -->

time="2017-04-21T21:12:12Z" level=warning msg="authorization failed after 1m0s: reachabily test failed: wrong status code '502'" context=acme domain=n.alp.im
time="2017-04-21T21:12:12Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:12:12Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:12:12Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:12:12Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:12:12Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:12:12Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

time="2017-04-21T21:13:51Z" level=warning msg="authorization failed after 1m0s: getting authorization failed: 403 urn:acme:error:unauthorized: No registration exists matching provided key" context=acme domain=n.alp.im
time="2017-04-21T21:13:51Z" level=error msg="Error while process certificate requests: no domain could be authorized successfully" context=kubelego
time="2017-04-21T21:13:51Z" level=info msg="disable provider no TLS hosts found" context=provider provider=nginx
time="2017-04-21T21:13:51Z" level=info msg="process certificates requests for ingresses" context=kubelego
time="2017-04-21T21:13:51Z" level=info msg="creating new secret" context=secret name=echoserver-tls namespace=default
time="2017-04-21T21:13:51Z" level=info msg="no cert associated with ingress" context="ingress_tls" name=echoserver namespace=default
time="2017-04-21T21:13:51Z" level=info msg="requesting certificate for n.alp.im" context="ingress_tls" name=echoserver namespace=default

Makes me wonder what kind of external reverse dns caching kube-dns does. In any case, it impacts kube-lego.

gianrubio commented 7 years ago

@ahmetb this is an issue on kube-dns, not on kube-lego. I advise you to ask in the dns repo

ahmetb commented 7 years ago

@gianrubio I suppose you're right. DNS is a deep rabbit hole probably involves my cloud provider’s DNS support and such as well. It might be getting cached at many different levels. I will not pursue this one.

kbroughton commented 6 years ago

be sure that if you are following along the instructions at https://github.com/jetstack/kube-lego/blob/master/examples/nginx/README.md

that you have changed both echoserver/ingress-notls.yaml and echoserver/ingress-tls.yaml (not mentioned in the instructions)

I had only changed ingress-tls.yaml and reproduced the error:

msg="authorization failed after
1m0s: reachabily test failed: Get http://ngx.alp.im/.well-known/acme-challenge/_selftest:
dial tcp: lookup ngx.alp.im on 10.3.240.10:53: no such host"