Closed hansmi closed 4 years ago
yeah, this is a hot fix to avoid getting rate-limited. Will be replaced by having rate limits/backoff build into the controller which is at the top of my list
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/lifecycle frozen
I've just deployed this on a fresh Openshift 3.11 cluster and all the Routes that I've annotated have failed verification and gone into this paused state. If I remove the paused annotation, the same problem reoccurs and the Route is paused again. Is there any progress on fixing this? It seems like this is unusable currently which is a real shame.
@jameseck getting paused is not the error but a manifestation of something set up incorrectly. Let's encrypt validation fails, and it gets paused so your account doesn't waste the rate limit and you can fix it before the next try. The issue you need to solve is to setup the cluster correctly so the validation doesn't fail.
Make sure the route can actually reach your app and you can reach the temporary Route (/.well-known/...) manually from outside of the cluster and that your DNS record points to the router.
Thanks for the response and thanks for the project! I misunderstood the actual issue and solved it by fixing the incorrect settings in my haproxy LB that's in front of this cluster. It would still be nice to have it retry paused routes periodically, but not urgent now.
Yeah, I've was just trying to help you understand this is not a blocker for it to work. Glad you figured out the settings.
I agree this is something that needs to be addressed eventually and is #2
on my list when I get https://github.com/tnozicka/openshift-acme/pull/92 in.
To slightly ease out the burden before we get rate limiting:
To list all the paused Routes:
oc get route -A -o json | jq -r '.items[] | select(.metadata.annotations."kubernetes.io/tls-acme-paused") | "-n \(.metadata.namespace) \(.metadata.name)"'
To retry all the paused Routes:
oc get route -A -o json | jq -r '.items[] | select(.metadata.annotations."kubernetes.io/tls-acme-paused") | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch route -p='{"metadata": {"annotations": {"kubernetes.io/tls-acme-paused": null}}}'
I imagine one could setup a CronJob running the retry script like once a week to always force the retry before native support arrives.
The
RouteController.handle
function sets akubernetes.io/tls-acme-paused
annotation if the API returned a status ofinvalid
:https://github.com/tnozicka/openshift-acme/blob/f0608627f45f8cce432c1d6b6625d0add42a94c1/pkg/controllers/route/route.go#L572-L582
Once that annotation is set the route is skipped indefinitely. There is no code removing the annotation. Instead a manual intervention is necessary.
One would expect that such routes are retried after a reasonable timeframe, i.e. a day.