Open Evesy opened 1 year ago
This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
This is not showing a bug.
/remove-kind bug
The error message is stating this
SSL: error:0908F066:PEM routines:get_header_and_data:bad end line
and I suspect that is related to that pem file. And that pem file could be related to the auth-tls-secret annotation.
You could create another app and ingress with a vanilla image nginx:alpine and see if simple no extra-annotation ingress works. If simple ingress works, then you can proceed to add that annotation and see if the previously working ingress fails after adding that annotation.
this file /etc/ingress-controller/ssl/ca-ingress-nginx-cloudflare-origin-pull-ca.pem
@longwuyuan Is this not showing a bug?
The file (/etc/ingress-controller/ssl/ca-ingress-nginx-cloudflare-origin-pull-ca.pem
) is loaded in by Nginx based on the annotation: nginx.ingress.kubernetes.io/auth-tls-secret: ingress-nginx/cloudflare-origin-pull-ca
The referenced Kubernetes secret, ingress-nginx/cloudflare-origin-pull-ca
, is not changing when Nginx is being rolling restarted. The data in the secret is static and sound, and ingress-nginx also eventually loads this correctly without intervention.
This leads me to think ingress-nginx is attempting to validate/load the nginx config, which references that PEM on disk, before ingress-nginx has actually read the secret and written it to it's local filesystem
What are your thoughts?
hi @Evesy , thanks for reporting this. the requirement is complete detailed data on that error.
With cloudflare CA being involved in your post, I think there is a lot to be considered, hence the small tiny minute details of the problem will help a lot. Cloudflare CA and fullchain etc for auth etc are a specialist's area
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev
on Kubernetes Slack.
Hello 👋
By chance, do you have any other findings around this @Evesy ?
Believe I experience a similar situation but with the CA CRL file instead of the CRT (my secret provided by the annotation holds both "ca.crt" and "ca.crl").
I've confirmed it's happening on versions 1.1.3 ; 1.3.1 ; 1.4.0 and 1.5.1. Although on v1.1.3 the logging format appears slightly different.
Hey @Restless-ET -- Unfortunately we haven't seen a reoccurrence of this issue since I raised the issue, and I was never able to reliably reproduce the issue either
Yes, I experience the same... when I release a new version or simply do a rollout restart it doesn't happen every time and even when it does it's not for all the controller pods.
It doesn't seem to affect functionality on any of the endpoints configured, so I guess at this stage is really more about a logs noise reduction (and quicker detection of actual problems) then anything else.
Anyway, thank you for getting back on this. :)
This problem has severely impacted us in the past, I have just now been able to compile the information and replicate the problem.
I also believe it's the same underlying issue causing #10234 and #10265
DaemonSet
(but can be reproduced with deployment)The following makes this issue occur more often
Events
).Error: UPGRADE FAILED: failed to create resource: admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:
-------------------------------------------------------------------------------
Error: exit status 1
2023/06/06 16:55:24 [emerg] 4002#4002: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/test-mtls-truststore.pem") failed (SSL: error:0B084088:x509 certificate routines:X509_load_cert_crl_file:no certificate or crl found)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/test-mtls-truststore.pem") failed (SSL: error:0B084088:x509 certificate routines:X509_load_cert_crl_file:no certificate or crl found)
nginx: configuration file /tmp/nginx/nginx-cfg636383756 test failed
or
2023/02/06 17:24:42 [emerg] 34#34: SSL_load_client_CA_file("/etc/ingress-controller/ssl/test-mtls-truststore.pem") failed (SSL: error:0908F066:PEM routines:get_header_and_data:bad end line)
nginx: [emerg] SSL_load_client_CA_file("/etc/ingress-controller/ssl/test-mtls-truststore.pem") failed (SSL: error:0908F066:PEM routines:get_header_and_data:bad end line)
I did the following in minikube
#!/bin/bash
VERSION=4.7.1
NS=ingress-test
# Install ingress controller
helm upgrade nginx ingress-nginx/ingress-nginx -i --version ${VERSION} -n ${NS} --create-namespace
echo Wait for ingress controller to be live
until kubectl wait --for=condition=Ready pod --selector app.kubernetes.io/component=controller
do
sleep 1
done
# Create large truststore (increased likelyhood of race condition)
cat << EOF | kubectl apply -n ${NS} -f - --server-side
apiVersion: v1
data:
ca.crt: |
$(cat /etc/ssl/certs/ca-certificates.crt | base64 | sed "s/^/ /")
kind: Secret
metadata:
name: truststore
type: Opaque
EOF
# Create ingress
cat <<EOF | kubectl apply -n ${NS} -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
nginx.ingress.kubernetes.io/auth-tls-secret: ingress-test/truststore
nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
nginx.ingress.kubernetes.io/auth-tls-verify-depth: "1"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
update-time: ""
name: ingress
spec:
ingressClassName: nginx
rules:
- host: dummy.host.com
http:
paths:
- backend:
service:
name: dummy-service
port:
number: 8080
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- dummy.host.com
EOF
Use 2 terminals
exec into controller pod
kubectl exec -it deployment.apps/nginx-ingress-nginx-controller -- bash
Run the following command/script which:
expected_md5=$(md5sum /etc/ingress-controller/ssl/ca-ingress-test-truststore.pem)
cnt=0
while true
do
if [[ "$(md5sum /etc/ingress-controller/ssl/ca-ingress-test-truststore.pem)" == "${expected_md5}" ]] ; then
let cnt++
else
echo "success count: $cnt"
cnt=0
echo "failure! $(date)"
fi
done
outputs:
success count: 1272
failure! Thu Aug 24 15:41:06 UTC 2023
success count: 663
failure! Thu Aug 24 15:41:12 UTC 2023
success count: 402
failure! Thu Aug 24 15:41:16 UTC 2023
success count: 392
failure! Thu Aug 24 15:41:19 UTC 2023
success count: 246
failure! Thu Aug 24 15:41:22 UTC 2023
or run (performed internally by the controller to validate the config)
cnt=0
while true
do
if nginx -tq ; then
let cnt++
else
echo "success count: $cnt"
cnt=0
echo "failure! $(date)"
fi
done
outputs:
2023/08/24 15:50:18 [emerg] 4320#4320: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: configuration file /etc/nginx/nginx.conf test failed
success count: 9
failure! Thu Aug 24 15:50:18 UTC 2023
2023/08/24 15:50:21 [emerg] 4332#4332: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: configuration file /etc/nginx/nginx.conf test failed
success count: 7
failure! Thu Aug 24 15:50:21 UTC 2023
2023/08/24 15:50:24 [emerg] 4347#4347: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: configuration file /etc/nginx/nginx.conf test failed
success count: 9
failure! Thu Aug 24 15:50:24 UTC 2023
2023/08/24 15:50:25 [emerg] 4352#4352: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-ingress-test-truststore.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: configuration file /etc/nginx/nginx.conf test failed
success count: 2
failure! Thu Aug 24 15:50:25 UTC 2023
After the monitoring is running in terminal 1
Create an update storm by constantly patching the Ingress
resource.
while true; do
kubectl patch -n ingress-test ingress ingress --type merge --patch "metadata: {annotations: {update-time: \"$(date)\"}}"
done
outputs:
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched
ingress.networking.k8s.io/ingress patched (no change)
Ingress
resource causes the certificate to be update). Common cause is ingress controllers restarting causing the Ingress.Status
to be updated./etc/ssl/certs/ca-certificates.crt
+ our internal CA)Ex. where ingress
is the ingress namespaces that has Secret mtls-truststore
metadata:
annotations:
nginx.ingress.kubernetes.io/auth-tls-secret: ingress/mtls-truststore`
controller.extraArgs.update-status: "false"
. Controller restarts no longer cause the Ingress resource to change (Updating resource status).
status:
loadBalancer:
ingress:
- ip: x.x.x.x
func ConfigureCACert(name string, ca []byte, sslCert *ingress.SSLCert) error {
caName := fmt.Sprintf("ca-%v.pem", name)
+ tmpFileName := fmt.Sprintf("%v/.%v", file.DefaultSSLDirectory, caName)
fileName := fmt.Sprintf("%v/%v", file.DefaultSSLDirectory, caName)
+ // Perform atomic write by doing a write followed by a rename (unix only)
- err := os.WriteFile(fileName, ca, 0644)
+ err := os.WriteFile(tmpFileName, ca, 0644)
+ if err == nil {
+ err = os.Rename(tmpFileName, fileName)
+ }
if err != nil {
return fmt.Errorf("could not write CA file %v: %v", fileName, err)
}
sslCert.CAFileName = fileName
klog.V(3).InfoS("Created CA Certificate for Authentication", "path", fileName)
return nil
}
I am willing to provide a PR with fixes if you can provide some guidance on my proposed solution(s).
Just to add some information on this, we are able to consistently reproduce the issue by deploying ingresses with the following annotations
annotations:
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/proxy-ssl-name: non-existent-service.user-xx-yy-sandbox.svc.cluster.local
nginx.ingress.kubernetes.io/proxy-ssl-secret: user-xx-yy-sandbox/dummy-proxy-ssl-secret
nginx.ingress.kubernetes.io/proxy-ssl-verify: "on"
nginx.ingress.kubernetes.io/proxy-ssl-verify-depth: "2"
Attempts to deploy many such ingresses simultaneously gives errors such as
-------------------------------------------------------------------------------
* admission webhook "validate.nginx.ingress.kubernetes.io" denied the request:
-------------------------------------------------------------------------------
Error: exit status 1
2024/03/04 12:27:39 [warn] 2185398#2185398: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg16865532:145
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg16865532:145
2024/03/04 12:27:39 [warn] 2185398#2185398: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg16865532:146
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg16865532:146
2024/03/04 12:27:39 [warn] 2185398#2185398: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg16865532:147
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg16865532:147
2024/03/04 12:27:39 [emerg] 2185398#2185398: SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-user-xx-yy-sandbox-dummy-proxy-ssl-secret.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: [emerg] SSL_CTX_load_verify_locations("/etc/ingress-controller/ssl/ca-user-xx-yy-sandbox-dummy-proxy-ssl-secret.pem") failed (SSL: error:04800066:PEM routines::bad end line error:05880009:x509 certificate routines::PEM lib)
nginx: configuration file /tmp/nginx/nginx-cfg16865532 test failed
Observations:
As our ingresses only need to use a single shared CA bundle which doesn't change often, our workaround right now is to mount said bundle as a configmap into the nginx pods, then use a configuration snippet to turn on TLS verification to the backend pods, referencing the mounted secret.
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_ssl_trusted_certificate /path/to/mounted/bundle.pem;
proxy_ssl_verify on;
proxy_ssl_verify_depth 2;
proxy_ssl_name non-existent-service.user-xx-yy-sandbox.svc.cluster.local;
This seems to dodge the race condition but is far from ideal, not least because enabling configuration snippets exposes vulnerabilities.
I created a helm chart which consistently reproduces the issue. It deploys a placeholder secret, then deploys many ingresses with the above annotations which reference said secret.
Hi,
It seems distinctly that an event like a rollout of the controller resulting in existing controller pods terminating and new controller pods being created is required to cause this. Another event seems like a large volume of ingresses with the relevant annotation that injects secrets causes this. I see that some comments also concur that race condition(s) like situations are not ruled out.
To state the obvious, just one or a few ingresses syncing concurrently does not cause this problem. Also it is obvious that for the users that have mTLS secrets in ingresses and that too either in large volumes or involved in rollout during upgrades, require a better experience.
But the project is extremely short on resources and there is no developer time available to work on this. If a PR is submitted then it is likely that it will get reviewed but a e2e-test that mirrors the conditions in a kind cluster is a absolute requirement. I see the need for lots of certs there.
The project resources have a priority to work on securing the controller by default and also implementing the Gateway-API. We have actually deprecated features that are far from the implications of the Ingress-API specs like the tcp/udp forwarding.
But the best step forward is that I request you join the community meeting with announcing the intent to do so and discuss this in the ingress-nginx-dev channel of the Kubernetes Slack. It would help a lot.
What happened:
During startup of nginx we observed Nginx emitting emergency level logs as the configuration contained references to certificate files that Nginx had not yet loaded into the pod
What you expected to happen:
ingress-nginx
should fully write secrets to the pod before attempting to start upNGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.): 1.4.0
Kubernetes version (use
kubectl version
): 1.24.8Environment:
Cloud provider or hardware configuration: GKE
OS (e.g. from /etc/os-release): ContainerOS
Kernel (e.g.
uname -a
): Linux ingress-nginx-external-controller-6c9449fbfd-p584h 5.10.147+ #1 SMP Thu Nov 10 04:41:53 UTC 2022 x86_64 LinuxCurrent state of ingress object, if applicable:
Others:
How to reproduce this issue: This hasn't been reproducible in a smaller test environment as of yet, it only seems to happen on our cluster with ~1000 ingresses. We've been on 1.4 for some time now and this is the first time we've observed the issue when nginx is rolling out