Closed koolfy closed 1 week ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/triage needs-information
It will help a lot if you can write a step-by-step guide to reproduce this problem in minikube or kind cluster. It will provide insight on this being a problem with the controller or with the resources allocated to the pods.
/remove-kind bug
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
"No certificate not found" and "bad end line" would be errors when reading the secret is a problem. so in the absence of enough information to pinpoint the root-cause, it can be assumed that the event occured during new deployments and the secret was not ready when the ingress was attempted to be created. Or the I/O caused a failed read of the secret.
/close
@longwuyuan: Closing this issue.
What happened:
When deploying many environments, using the following annotation:
nginx.ingress.kubernetes.io/auth-tls-secret: ingress/stg-ca
This "ingress/stg-ca" secret is not modified or recreated during these deployments, but sometimes seems to have difficulties being read during the admission webhook validationAdmission webhook sometimes (often) fails on busy but functional clusters with these errors, when deploying new environments with known-to-be-valid ingress objects. Re-running it will work properly on some occasions, confirming there is nothing fundamentally wrong with the ingress objects themselves.
It might be some form of race-condition depending on k8s cluster (disk i/o) business?
What you expected to happen:
Admission webhook should only fail if the ingress objects produce malformed configurations or otherwise invalid certificates, and not produce false-positive failures on deployments.
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
Kubernetes version (use
kubectl version
):Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2-gke.2100", GitCommit:"00dd416d1e3300d98717d48686c7cde7cb5dd6b5", GitTreeState:"clean", BuildDate:"2023-06-14T09:21:52Z", GoVersion:"go1.20.4 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
Environment:
Cloud provider or hardware configuration: GCP
How was the ingress-nginx-controller installed:
helm ls -A | grep -i ingress
ingress-nginx ingress 69 2023-06-30 22:19:19.263591445 +0000 UTC deployed ingress-nginx-4.7.1 1.8.1
helm -n <ingresscontrollernamepspace> get values <helmreleasename>
kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
How to reproduce this issue: Not 100% sure, problem is definitely not deterministic
Anything else we need to know: It might happen more often at hours when there is notable disk I/O on the system partition of kubernetes nodes, but still shouldn't probably fail like this.