Open rachitchauhan43 opened 2 years ago
Is there a reason why you're unable to delete the namespace?
@psschwei : At our org, k8s is managed service by a central team. Although, it's possible to delete that and re-create but it does make the whole process tedious as I have to move out of kustomize framework to do so to use their own cli/tools.
Just to add a little more detail here: when the serving-core.yaml
file is applied, it initially creates an empty secret with for the webhook certs, then as part of its reconciliation loop the certs are populated into the secret once the leaderelection lease is acquired.
In the situation described in this issue (installing, deleting everything but the namespace, and then reinstalling), it looks like the lender lease is never acquired, and as a result the certs never get populated to the secret, and thus the failures being seen.
would need to dig into it a bit more to determine if leader election failing in this scenario is expected or a bug...
@psschwei : Can this issue be triaged for next release? Or do we know if this is the expected behavior?
We just ran into what is probably the same issue on Serving 1.3.2
and Operator 1.5.3
hosted in Azure (AKS). We had to perform a cluster certificate rotation. Afterwards all of the Knative Serving pods were in a CrashLoopBackoff
due to invalid certificates. We tried deleting all -certs
secrets. They were recreated but with metadata only. We waited for >5 mintues which should be long enough for any leader election related issue. Deleting the namespace was the only workaround that we could find.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
/lifecycle stale
" it looks like the lender lease is never acquired" i also found this,and should do this kubectl get lease -n knative-serving |grep webhook | awk '{print $1}' |xargs kubectl delete lease -n knative-serving
it looks like the lease can be acquired again, but why it happen @psschwei
/reopen
What version of Knative?
Expected Behavior
This is what I am doing and happening right now:
Actual Behavior
Without deleting the namespace completely, re-apply of knative-serving second time would fail as webhooks certs won't be populated.
Steps to Reproduce the Problem