Open nikhil-thomas opened 2 years ago
@vdemeester @afrittoli @dibyom @bobcatfish @imjasonh
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen
with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
/lifecycle frozen
This does become quite a problem when the installation manifest is used with a GitOps operato like Flux. I found that dropping ValidatingWebhookConfiguration/config.webhook.pipeline.tekton.dev
emilinates the probel, however webhook pod ends up logging plenty of errors. I think that disabling config map validation should be allowed, perhaps there should be some official way of doing that avoids endless errors also.
@errordeveloper did you find a resolution to this issue? I am seeing exactly the same issue during tekton installation using flux on GKE
@errordeveloper did you find a resolution to this issue? I am seeing exactly the same issue during tekton installation using flux on GKE
I managed to break up webhooks into a separate kustomization. However, it didn't survive for long as tekton was too large of a dependency for the project I am working on.
For anyone using ArgoCD to manage a tekton-pipelines helm chart that is getting burned by this. Utilizing ArgoCD annotations for sync phases and sync waves on each of the MutatingWebhookConfiguration and ValidatingWebhookConfiguration resources will ensure that tekton's ConfigMaps can properly get created preventing the race condition/deadlock on a cold install.
I used this annotation specifically and had some success:
argocd.argoproj.io/hook: PostSync
Not the best solution and may not solve the upgrade workflow. ideally this would be something that is handled internally inside of tekton's controllers.
Summary
The dependency between ValidatingWebhookConfiguration config.webhook.pipeline.tekton.dev and Deployment tekton-pipelines-webhook is cyclic. This can lead to a potential deadlock which can block upgrades.
How to get "deadlocked"
NAME READY UP-TO-DATE AVAILABLE AGE
tekton-pipelines-controller 1/1 1 1 15s
tekton-pipelines-webhook 1/1 1 1 12s
config-defaults
config maperror: configmaps "config-defaults" could not be patched: Internal error occurred: failed calling webhook "config.webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=10s": no endpoints available for service "tekton-pipelines-webhook"
You can run
kubectl replace -f /tmp/kubectl-edit-mnnza.yaml
to try this update again.deployment.apps/tekton-pipelines-webhook scaled
check logs from webhook pod
"error":"configmap \"config-defaults\" not found"
try to recreate-config-defaults configMap
Error from server (InternalError): error when creating "https://github.com/tektoncd/pipeline/releases/download/v0.32.1/release.notags.yaml": Internal error occurred: failed
calling webhook "config.webhook.pipeline.tekton.dev": Post "https://tekton-pipelines-webhook.tekton-pipelines.svc:443/config-validation?timeout=10s": no endpoints available
for service "tekton-pipelines-webhook"