Closed jsha closed 4 years ago
So after digging into this a bit more, I don't think this will achieve quite what we want.
kube-lego doesn't actually use a rate limited workqueue when reading work items off its internal queue. This means as soon as a hard failure has occurred, it will be retried within a couple of seconds.
The rate limit that's being changed here, is used here: https://github.com/jetstack/kube-lego/blob/10b552655035a56cab47cb3a74a52fc113c53a6b/pkg/acme/cert_request.go#L142
As you can see, adjusting this backoff will actually just cause us to run 'verifyAuthz' at greater intervals. This will help slow down requests, but at the cost of blocking up processing of all other resources.
Instead, I think we should not accept this change, and instead convert the workqueue used internally to be a rate limited queue, and set the parameters you've put forward in this PR here, on that queue instead.
I'll create a PR with that change and we can go from there.
Thanks for the reply! I read up a little on the backoff
package, and I realized its behavior is not what I expected. For instance, with a MAX_ELAPSED_TIME of 24h, it will back off until it reaches 24 hours, and then stop, which causes Acme.ObtainCertificate
to return an error, which in turn causes KubeLego.processProvider
to proceed to the next iteration of the loop; then the next loop attempt will start at the min backoff value again.
This isn't the behavior we'd like. What we'd like is for clients to back off until they hit 24 hours, and then remember that backoff interval until they get a success. In other words, we have many many clients that fail every time they attempt issuance. We want those clients to never try more than once every 24 hours.
I agree this isn't the right fix. It seems like what's really needed is for KubeLego.processProvider
to have a separate backoff for each Provider, and remember that backoff over time.
As a stopgap solution (because this is a time-sensitive issue for us), I'm going to submit a PR that just adds a plain sleep to KubeLego.processProvider
for now.
Hi @jsha. Thanks for your PR.
I'm waiting for a jetstack member to verify that this patch is reasonable to test. If it is, they should reply with
/ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.I understand the commands that are listed here.