tnozicka / openshift-acme

ACME Controller for OpenShift and Kubernetes Cluster. (Supports e.g. Let's Encrypt)
Apache License 2.0
319 stars 116 forks source link

Error failed to get ACME #81

Closed flipkill1985 closed 5 years ago

flipkill1985 commented 5 years ago

Hello,

i get this Error:

Error syncing Route test/gohellouniverse: failed to get ACME client: Get https://acme-v01.api.letsencrypt.org/directory: x509: certificate is valid for *.apps.openshift.xxx-home.de, apps.openshift.xxx-home.de, not acme-v01.api.letsencrypt.org

I think the URL is wrong?

https://acme-v01.api.letsencrypt.org/directory: should be this: https://acme-v01.api.letsencrypt.org/directory

best regards Jan

flipkill1985 commented 5 years ago

The URL in the Environment Variable are ok:

OPENSHIFT_ACME_ACMEURL=https://acme-v01.api.letsencrypt.org/directory

drzigman commented 5 years ago

Greetings,

I am having a similar issue, but I don't think it's the URL (since the error message is what is adding that extra colon at the end, both places it is output):

return fmt.Errorf("failed to get ACME client: %v", err)

Rather, I think something is going on where the router's default cert is being offered up somewhere.

I1206 05:05:01.286913       1 route.go:385] Started syncing Route "openshift-infra/hawkular-metrics" (2018-12-06 05:05:01.28687158 +0000 UTC m=+2704.437421957)
--
  | I1206 05:05:09.077163       1 route.go:387] Finished syncing Route "openshift-infra/hawkular-metrics" (7.790278863s)
  | E1206 05:05:09.077210       1 route.go:726] failed to get ACME client: Get https://acme-v01.api.letsencrypt.org/directory:  x509: certificate is valid for *.router.default.svc.cluster.local, router.default.svc.cluster.local, not acme-v01.api.letsencrypt.org
  | I1206 05:05:09.077221       1 route.go:727] Dropping Route "openshift-infra/hawkular-metrics" out of the queue: failed to get ACME client: Get

I have this issue if I delete the tls spec as well as if I provide a self signed cert for the correct FQDN. It seems that the default router cert is still being served by something.

I'm not quite sure where this seems to be happening, if I visit the website in a browser I get the correct cert with correct FQDN being served.

Would greatly appreciate any suggestions or nudges in the right direction as to what I could be doing wrong. Thanks!

drzigman commented 5 years ago

So! I ended up figuring this out...

I had set up a wildcard DNS entry for my cluster, and from reading other bug reports I was directed to the documentation:

In your /etc/resolv.conf file on each node host, ensure that the DNS server that has the wildcard entry is not listed as a nameserver or that the wildcard domain is not listed in the search list. Otherwise, containers managed by OKD may fail to resolve host names properly.

I had set up .openshift.mydomain.com, and not .apps.openshift.mydomain.com (or some other app like subdomain). Because of this and the presence of openshift.mydomain.com in the search list I was unable to properly perform DNS resolution from the cluster. I'm not entirely certain why this is the case, but it does align with what the docs say.

Once I dropped the *.openshift.mydomain.com DNS record things started to flow again :)

Hopefully this can help someone else.

tnozicka commented 5 years ago

yeah, this does seem like DNS issue. glad you figured it out.