Closed MohammadKarimi23 closed 3 years ago
It keeps trying because the validation is failing.
I0425 12:32:11.598231 1 route.go:641] Route "test-namespace/acme-test": Order "https://acme-staging-v02.api.letsencrypt.org/acme/order/13200343/87430482" is in "invalid" state
I0425 12:32:03.151779 1 route.go:996] Can't self validate exposed token before accepting the challenge: getting "http://acme-test.my-domain.io/.well-known/acme-challenge/vd18FKkh1bQWLISxxrLRh_QrsGrfXXpefNpQIsN8WlU" return status code 404, expected 200: status "404 Not Found": content head: <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.13.12</center>
this means that URL can't be reached. Is it the actual domain you tried or redacted one?
Also nginx means it gets probably stuck on your loadbalancer. It isn't coming from from openshift-acme exposer and OCP uses HAProxy
I've sent the result of curl
on temporary route in issue description (I've run it locally in my machine to make sure it's exposed).
The reason for 404 errors is probably for when the exposer pods are getting deleting and before a new one is started. you can see in the full log that there are different errors while the exposer is running.
Also nginx error is not related to loadbalancer. the pod which the route is pointing to is running nginx.
and the domain is a redacted one of course 😄
The reason I asked is because your challenge failed the verification by let's encrypt:
I0425 12:32:11.598231 1 route.go:641] Route "test-namespace/acme-test": Order "https://acme-staging-v02.api.letsencrypt.org/acme/order/13200343/87430482" is in "invalid" state
I0425 12:32:11.598442 1 route.go:1245] Cleaning up temporary exposer for Route test-namespace/acme-test (UID=6a6e98c5-86f0-11ea-b7c6-fa163ef6d455)
I0425 12:32:11.600056 1 event.go:281] Event(v1.ObjectReference{Kind:"Route", Namespace:"test-namespace", Name:"acme-test", UID:"6a6e98c5-86f0-11ea-b7c6-fa163ef6d455", APIVersion:"route.openshift.io/v1", ResourceVersion:"2610968", FieldPath:""}): type: 'Warning' reason: 'AcmeFailedOrder' Order "https://acme-staging-v02.api.letsencrypt.org/acme/order/13200343/87430482" for domain "acme-test.my-domain.io" failed: <nil>
In 90% of case this is the domain that is either only in local DNS or isn't setup to direct public access to the Router.
Yeah you're right! the routers in my Openshift cluster are exposed on vitual IPs inside a private network and authorization gets failed. This is the result I got in "detail" section of JSON result when trying to authorize manually:
No valid IP addresses found for acme-test.mydomain.io
although it would be nice if the logs showed the reason and route was marked out so controller cleans resources and doesn't retry process for the route.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
What happened: Certificate fails to get provisioned because controller creates and delete new exposer pods after a new route is added with
kubernetes.io/tls-acme=true
annotation.What you expected to happen: A fake ACME certificate should be assigned to the route (staging environment is deployed). Also, the exposer pod should be deleted after serving http challenge.
How to reproduce it (as minimally and precisely as possible): Creating a new route with
kubernetes.io/tls-acme=true
annotation.Anything else we need to know?: here's a part of controller logs (the original logs can be found here):
Controller keeps creating and deleting exposer pod/service/routes! Result of
oc get routes -n test-namespace -w
:even the temporary route created for http challenge is responsive and returns the secret:
Environment:
v3.11.0+39132cb-398
@tnozicka